[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-05-07 Thread xiaoli (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17340593#comment-17340593
 ] 

xiaoli commented on YARN-9927:
--

mark

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-05-04 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17339383#comment-17339383
 ] 

Qi Zhu commented on YARN-9927:
--

Great review and investigation!

Thanks very much  [~ebadger] [~ebadger] .

I agree with you that we should do some stress test done via SLS or manually. 
And the more generic way of event handling is a great improvement in YARN.

And about the test, i will change it to test both the multi-thread and the 
single one.

 

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-05-03 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338237#comment-17338237
 ] 

Andras Gyori commented on YARN-9927:


Thank you [~zhuqi] for the patch. I think this approach is well thought, 
because it reuses existing logic already established elsewhere (services as 
separate threads). On this part I have one addition:
 * Please move MultiThreadDispatcher to its own file as ResourceManager already 
has quite a good amount of code.

As for RMNode event handling, I have one proposition. Usually keeping code as 
simple as possible is a good recommendation, but I do think event handling is a 
crucial part of YARN, and it might be worthwhile to provide fine tuning 
options. The RMNode event handling is a good way to improve performance, but I 
could see a value in providing a more generic way of event handling. A proof of 
concept implementation of my proposition is:
 # Create a MultiThreadEventHandler wrapper
{code:java}
 public static class MultiThreadEventHandler implements EventHandler> {
private final ThreadPoolExecutor multiHandlerThreadPool;
private final EventHandler> handler;

public MultiThreadEventHandler(EventHandler> handler,
   int maximumPoolSize) {
  this.handler = handler;
  ThreadFactory threadFactory = new ThreadFactoryBuilder()
  .setNameFormat("multiHandlerThread #%d")
  .build();
  multiHandlerThreadPool = new ThreadPoolExecutor(
  5, maximumPoolSize, 10, TimeUnit.SECONDS,
  new LinkedBlockingQueue<>(), threadFactory);
}

@Override
public void handle(Event event) {
  multiHandlerThreadPool.submit(() -> handler.handle(event));
}
  }
{code}

 # Provide configuration values to set MultiThreadEventHandler for a specific 
EventType and the MultiThreadDispatcher#register would look like this
{code:java}
 @Override
public void register(Class eventType,
EventHandler handler) {
  if (eventTypeDispatcherMap.get(eventType) == null) {
AsyncDispatcher asyncDispatcher =
createDispatcher(eventType);
eventTypeDispatcherMap.put(eventType,
asyncDispatcher);
addIfService(asyncDispatcher);
  }
  EventHandler registeredHandler = handler;
  boolean isMultiThreadEventHandler = 
getConfig().getBoolean("yarn.scheduler.event." + eventType.getCanonicalName()
  + ".multi-thread-handler.enabled", false);
  if (isMultiThreadEventHandler) {
int poolSize = getConfig().getInt("yarn.scheduler.event." + 
eventType.getCanonicalName()
+ ".multi-thread-handler.max-pool-size", 5);
registeredHandler = new MultiThreadEventHandler(handler, poolSize);
  }

  eventTypeDispatcherMap.
  get(eventType).register(eventType, registeredHandler);
}
{code}

As it was emphasised before, this is a performance critical section of YARN, 
therefore some kind of stress test done via SLS or manually would need to be 
done to make sure RM is not crippled by these changes and the performance 
increase justifies this complexity and extended hardware resource usage. 

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-29 Thread Eric Badger (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17335804#comment-17335804
 ] 

Eric Badger commented on YARN-9927:
---

{noformat}
+// Test multi thread dispatcher
+conf.setBoolean(YarnConfiguration.
+MULTI_THREAD_DISPATCHER_ENABLED, true);
{noformat}
If this is a feature that is disabled by default, I don't think we should have 
it enabled by default in all of the RM tests. I would be happier running it as 
a parameterized test with both multi and single thread dispatchers.

In general I think the patch looks reasonable, but I would like to see testing 
done to see if this makes the problem better or worse. I would think it would 
make things better, but until we run some real tests on it, we won't really 
know. So getting something similar to what [~hcarrot] provided originally would 
be good. That way we can merge this with confidence. 

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-28 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334689#comment-17334689
 ] 

Hadoop QA commented on YARN-9927:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
45s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  3m  
4s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 23m 
11s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
55s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  3m  
6s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m 44s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
39s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
42s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 31m 
19s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  6m 
18s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
28s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
15s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
17s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 10m 
17s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
57s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
57s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
42s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
47s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
14m 55s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| 

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-28 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17334565#comment-17334565
 ] 

Qi Zhu commented on YARN-9927:
--

[~ebadger] [~gandras] [~epayne] 

Updated a patch to improve:

1. The event produce to event queue :
Each eventType will have single async dispatcher thread, the thread number is 
small which i have counted. And each dispatcher has single event queue, events 
will not affect each other anymore.

2. The event consume from event queue and process:

I add an example to multi thread processing in handler just for RMNodeEvent. 
Different RMNodeImpl object has different write lock, i think it can be faster 
to make multi thread handle with different RMNode object.

What's your opinions about this?

Thanks.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch, YARN-9927.005.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-13 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17320065#comment-17320065
 ] 

Hadoop QA commented on YARN-9927:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 29m  
1s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
58s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 27m 
 8s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 12m  
2s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
52s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
57s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
59s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
22m 29s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
53s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  3m  
3s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 34m 
26s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  6m  
4s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 6s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
14s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
14s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
16s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
16s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
40s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
32s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
15m  1s{color} | {color:green}{color} | {color:green} patch has no errors when 
building and testing our client artifacts. {color} |
| 

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-12 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17319898#comment-17319898
 ] 

Qi Zhu commented on YARN-9927:
--

Fixed the checkstyle in latest patch.

cc [~gandras] [~pbacsko]

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch, YARN-9927.003.patch, 
> YARN-9927.004.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-04 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314467#comment-17314467
 ] 

Hadoop QA commented on YARN-9927:
-

| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  1m 
19s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:green}+1{color} | {color:green} {color} | {color:green}  0m  0s{color} 
| {color:green}test4tests{color} | {color:green} The patch appears to include 2 
new or modified test files. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
43s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
11s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m  
0s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
17s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
44s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
42s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m  6s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
13s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
26s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 30m 
38s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  5m 
58s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 6s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
14s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
14s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
16s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
16s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 39s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/892/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color}
 | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 
276 unchanged - 0 fixed = 278 total (was 276) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
33s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green}{color} | {color:green} The patch has no whitespace 
issues. {color} |
| 

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-03 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314344#comment-17314344
 ] 

Hadoop QA commented on YARN-9927:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 22m  
0s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
49s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
47s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
45s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
19s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
41s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
52s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
19m  8s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
30s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
23s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue} 25m 
56s{color} | {color:blue}{color} | {color:blue} Both FindBugs and SpotBugs are 
enabled, using SpotBugs. {color} |
| {color:green}+1{color} | {color:green} spotbugs {color} | {color:green}  3m 
57s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
27s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  9m 
14s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.10+9-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  9m 
14s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
15s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_282-8u282-b08-0ubuntu1~20.04-b08 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
15s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 40s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/891/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color}
 | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 2 new + 
253 unchanged - 0 fixed = 255 total (was 253) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | 

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-04-03 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17314294#comment-17314294
 ] 

Qi Zhu commented on YARN-9927:
--

[~gandras] [~pbacsko] 

Updated a patch, each eventType will have single async dispatcher thread. I 
will add unit test later.

I think we first realize the multi thread dispatcher for each eventType, so 
that the event queue will not be a problem. And about the thread number, i 
counted the eventType number, it is in reasonable range.

Then we can move to that, how can we consume the event queue faster.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch, YARN-9927.002.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-03-22 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305941#comment-17305941
 ] 

Andras Gyori commented on YARN-9927:


Thanks [~zhuqi] for the answer, I agree with your concerns. Ideally, we could 
predefine a small threadpool on creation, and decide on EventType registration 
whether we want that kind of event to be handled separately in a thread. I 
think a separate thread for NodeEvents and AppAttempts make sense, but we will 
not need a new thread for every event type.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-03-21 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17305895#comment-17305895
 ] 

Qi Zhu commented on YARN-9927:
--

Thanks [~gandras] for investigation and reply.

I agree with you that which you suggested is a better mode, the only concern to 
me is that if the thread number will be too many, and if it will cause some 
side effect compare with the original mode.

We also should make stress test to the new mode. 

Thanks. 

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-03-19 Thread Andras Gyori (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17304992#comment-17304992
 ] 

Andras Gyori commented on YARN-9927:


Thank you [~hcarrot] for raising this issue and [~zhuqi] for stepping up to 
continue this undertaking.
 These are my feedback and suggestions, which might not mirror the actual 
situation, therefore feel free to discuss the false information. I think the 
current patch is not the best approach of this problem. It does include the 
concerns already raised in this jira, and my insights as well.

The single entry point for an event is AsyncDispatcher#handle, which puts the 
event in the eventQueue, and is processed asynchronously in a single thread. 
There is no way this could be circumvented, because it is used as 
rmContext.getDispatcher() all over the place. We must retain this entry point.
 However I have a strong sense that the performance  bottleneck is actually the 
AsyncDispatcher#eventQueue (a BlockingQueue). In my opinion, the solution is 
exactly the suggestion that is already described in the documentation of 
AsyncDispatcher:
{code:java}
/**
  Dispatches {@link Event}s in a separate thread. Currently only single thread
  does that. Potentially there could be multiple channels for each event type
  class and a thread pool can be used to dispatch the events.
 */
{code}
My suggestion would be:
 # Store a new BlockingQueue for each event type in a HashMap
 # Create a new thread for each of the registered event type / eventQueue
 # Every thread is responsible for one eventQueue processing
 # The Dispatcher would map to an N:N:N (EventQueue:Thread:EventHandler) system 
(or a N:M:N where M is smaller than N in order to reduce the amount of 
threads), where N is the number of EventTypes registered

A more fine-grained solution is possible by making an M*N:M*N:N 
(EventQueue:Thread:EventHandler) system, where M is a number given on 
registration (how many thread should be processing this kind of event) and N is 
the number of EventTypes registered (as far as I am concerned the EventHandlers 
do not use locks internally, and they are thread safe). I am not sure if this 
is feasible, because of the external locks used in EventHandlers (eg. 
NodeEventHandler uses getRMNodes(), which is locked behind a ConcurrentMap -> I 
think this is the feedback which was given by [~adam.antal] and [~epayne]).

A dummy implementation of the aforementioned system would be:
{code:java}
public class ThreadedDispatcher {
private final ConcurrentMap> events;
private final ConcurrentMap> eventHandlers;

  public void register(Class eventType,
  EventHandler handler) {
  new Thread(() -> {
 EventHandler handler = eventHandlers.get(eventType);
 BlockingQueue eventQueue = events.get(eventType);
 while (!stopped && !Thread.currentThread().isInterrupted()) {
Event event = eventQueue.take();
handler.handle(event);
 }
  }).run();

  class GenericEventHandler implements EventHandler {

  public void handle(Event event) {
  events.get(event.getType()).put(event);
  }
  }

  }
{code}
cc: [~pbacsko]

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Sub-task
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Assignee: Qi Zhu
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-02-08 Thread hcarrot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281120#comment-17281120
 ] 

hcarrot commented on YARN-9927:
---

[~zhuqi] I will be appreciated if you can help to rebase and improve it.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-02-08 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17281119#comment-17281119
 ] 

Hadoop QA commented on YARN-9927:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime ||  Logfile || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
42s{color} | {color:blue}{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} || ||
| {color:green}+1{color} | {color:green} dupname {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} No case conflicting files 
found. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green}{color} | {color:green} The patch does not contain any 
@author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red}{color} | {color:red} The patch doesn't appear to 
include any new or modified tests. Please justify why no new tests are needed 
for this patch. Also please list what manual steps were performed to verify 
this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
55s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
48s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 10m 
19s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
31s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
54s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
59s{color} | {color:green}{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
20m 21s{color} | {color:green}{color} | {color:green} branch has no errors when 
building and testing our client artifacts. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
37s{color} | {color:green}{color} | {color:green} trunk passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m 
48s{color} | {color:green}{color} | {color:green} trunk passed with JDK Private 
Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:blue}0{color} | {color:blue} spotbugs {color} | {color:blue}  1m 
59s{color} | {color:blue}{color} | {color:blue} Used deprecated FindBugs 
config; considering switching to SpotBugs. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  6m 
13s{color} | {color:green}{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} || ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
25s{color} | {color:blue}{color} | {color:blue} Maven dependency ordering for 
patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 7s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m 
23s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Ubuntu-11.0.9.1+1-Ubuntu-0ubuntu1.20.04 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  8m 
23s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
49s{color} | {color:green}{color} | {color:green} the patch passed with JDK 
Private Build-1.8.0_275-8u275-b01-0ubuntu1~20.04-b01 {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 43s{color} | 
{color:orange}https://ci-hadoop.apache.org/job/PreCommit-YARN-Build/604/artifact/out/diff-checkstyle-hadoop-yarn-project_hadoop-yarn.txt{color}
 | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch generated 61 new + 
269 unchanged - 0 fixed = 330 total (was 269) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
49s{color} | {color:green}{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} 

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2021-02-08 Thread Qi Zhu (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280942#comment-17280942
 ] 

Qi Zhu commented on YARN-9927:
--

[~hcarrot] [~leftnoteasy] [~adam.antal] [~epayne]

Is it going on? I think it will be helpful in large cluster, i can help to 
rebase it.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-11-11 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971506#comment-16971506
 ] 

Hadoop QA commented on YARN-9927:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m 
40s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
50s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 22m 
17s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  8m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
34s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
17m 47s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  4m 
33s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
2s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
16s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  2m 
 0s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  7m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  7m 
33s{color} | {color:green} the patch passed {color} |
| {color:orange}-0{color} | {color:orange} checkstyle {color} | {color:orange}  
1m 23s{color} | {color:orange} hadoop-yarn-project/hadoop-yarn: The patch 
generated 63 new + 274 unchanged - 0 fixed = 337 total (was 274) {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  2m 
22s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
13m 46s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red}  1m 
30s{color} | {color:red} 
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  2m  
1s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red}  0m 55s{color} 
| {color:red} hadoop-yarn-api in the patch failed. {color} |
| {color:green}+1{color} | {color:green} unit {color} | {color:green}  3m 
53s{color} | {color:green} hadoop-yarn-common in the patch passed. {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 84m 19s{color} 
| {color:red} hadoop-yarn-server-resourcemanager in the patch failed. {color} |
| {color:red}-1{color} | {color:red} asflicense {color} | {color:red}  0m 
41s{color} | {color:red} The patch generated 2 ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black}182m 43s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| FindBugs | 
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager
 |
|  |  Nullcheck of nodeId at line 1202 of value previously dereferenced in 
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventMultiDispatcher.handle(RMNodeEvent)
  At ResourceManager.java:1202 of value previously dereferenced in 

[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-11-11 Thread hcarrot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16971421#comment-16971421
 ] 

hcarrot commented on YARN-9927:
---

sorry for the first time, our YARN version is not compatible with  the origin 
version. The patch has been uploaded again.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-11-05 Thread Hadoop QA (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967336#comment-16967336
 ] 

Hadoop QA commented on YARN-9927:
-

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  0m  
0s{color} | {color:blue} Docker mode activated. {color} |
| {color:red}-1{color} | {color:red} patch {color} | {color:red}  0m  7s{color} 
| {color:red} YARN-9927 does not apply to trunk. Rebase required? Wrong Branch? 
See https://wiki.apache.org/hadoop/HowToContribute for help. {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | YARN-9927 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12984902/YARN-9927.001.patch |
| Console output | 
https://builds.apache.org/job/PreCommit-YARN-Build/25101/console |
| Powered by | Apache Yetus 0.8.0   http://yetus.apache.org |


This message was automatically generated.



> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927.001.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-11-04 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16966849#comment-16966849
 ] 

Adam Antal commented on YARN-9927:
--

Thanks for the patch [~hcarrot]. 

Could you please name your patch aligned with the convention (something like 
YARN-9927.001.patch would be fine), and also set the status of this issue to 
Patch available so that jenkins can pick this up?

I'm ready to review it after the tests pass.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927-addMultiEventDispatcher.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-10-30 Thread hcarrot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962987#comment-16962987
 ] 

hcarrot commented on YARN-9927:
---

The code has been finished. I think this mechanism can handler 
RMNodeStatusEvent from different nodes concurrently. Because the lock object of 
different RMNode state machine is different.

[~adam.antal] [~epayne]

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf, 
> YARN-9927-addMultiEventDispatcher.patch
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-10-29 Thread Eric Payne (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962135#comment-16962135
 ] 

Eric Payne commented on YARN-9927:
--

Thank you [~hcarrot] for bringing this up and for your work in making the RM 
better, and thank you for providing the architecture document.
{quote}
bq. we just replace the time in the dispatcher queue with lock-holding time for 
each event. 
RM can process different events concurrently
{quote}
I share the same concern as [~adam.antal]. Since the code path through 
RMNodeStatusEvent is protected by locks, I think that even if multiple 
RMNodeStatusEvent events are being processed by multiple dispatcher threads at 
the same time, only one will actually be running. Unless the design is to 
dedicate one thread to handling only RMNodeStatusEvent events and the other 
threads to handling non-RMNodeStatusEvent events.

I look forward to seeing your POC.


> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957378#comment-16957378
 ] 

Wangda Tan commented on YARN-9927:
--

Thanks [~hcarrot] for working on this.

Tagging: [~prabhujoseph] , [~jhung] ,[~sunil.gov...@gmail.com] , [~epayne] for 
review.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread zhoukang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16957037#comment-16957037
 ] 

zhoukang commented on YARN-9927:


nice idea, we also want to do similar job. looking forward for the poc 
[~hcarrot]

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread hcarrot (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956949#comment-16956949
 ] 

hcarrot commented on YARN-9927:
---

The performance bottleneck is the single-thread RMEventDispatcher mode. Events 
are processed one by one. If we change single-thread to multi-thread, RM can 
process different events concurrently.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Major
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that
> 1) environment: a cluster with thousands of nodes
> 2) RMNodeStatusEvent dominates 90% time consumption of RM event scheduler
> 3) Meanwhile, RM event processing is in a single-thread mode, and It results 
> in the low headroom of RM event scheduler, thus performance of RM.
> So we proposed a RM multi-thread event processing mechanism to improve RM 
> performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-9927) RM multi-thread event processing mechanism

2019-10-22 Thread Adam Antal (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-9927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956764#comment-16956764
 ] 

Adam Antal commented on YARN-9927:
--

Thanks for filing this [~hcarrot], interesting approach.

One question that came to my mind is that: are you certain the the dispatcher 
is the real bottleneck here? 
I mean if an event processing requires holding the lock the whole time, then we 
just replace the time in the dispatcher queue with lock-holding time for each 
event. We should dig down that for a certain event type how long the lock 
should be hold.

> RM multi-thread event processing mechanism
> --
>
> Key: YARN-9927
> URL: https://issues.apache.org/jira/browse/YARN-9927
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: yarn
>Affects Versions: 3.0.0, 2.9.2
>Reporter: hcarrot
>Priority: Minor
> Attachments: RM multi-thread event processing mechanism.pdf
>
>
> Recently, we have observed serious event blocking in RM event dispatcher 
> queue. After analysis of RM event monitoring data and RM event processing 
> logic, we found that the proportion of RMNodeStatusEvent is less than other 
> events, but the overall processing time of it is more than other events. 
> Meanwhile, RM event processing is in a single-thread mode, and It results in 
> the decrease of RM's performance. So we proposed a RM multi-thread event 
> processing mechanism to improve RM performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org