[ 
https://issues.apache.org/jira/browse/YARN-9618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17305978#comment-17305978
 ] 

Qi Zhu edited comment on YARN-9618 at 3/22/21, 8:25 AM:
--------------------------------------------------------

Thanks [~gandras] for deep into.

You are right, main performance gain here is due to eliminating the unnecessary 
back reference to rmDispatcher on RMAppNodeUpdateEvent.

Actually, why we use an other async dispatcher here is try to make the 
rmDispatcher#eventQueue not boom to affect other event process. The boom will 
transformed to nodeListManagerDispatcher#eventQueue.

But the nodeListManagerDispatcher#eventQueue will boom also in heavy case, if 
we want the make nodeListManagerDispatcher#eventQueue not full in heavy case, 
it is another problem, this issue will not handle, we can discuss in 
multi-thread related issues.

And if we remove the async dispatcher here? Just keep the eliminating the 
unnecessary back reference to rmDispatcher on RMAppNodeUpdateEvent ?

cc [~pbacsko]  [~ebadger] 

What's your opinion?

Thanks.


was (Author: zhuqi):
Thanks [~gandras] for deep into.

You are right, main performance gain here is due to eliminating the unnecessary 
back reference to rmDispatcher on RMAppNodeUpdateEvent.

Actually, why we use an other async dispatcher here is try to make the 
rmDispatcher#eventQueue not boom to affect other event process.

But the nodeListManagerDispatcher#eventQueue will boom also in heavy case, if 
we want the make nodeListManagerDispatcher#eventQueue not full in heavy case, 
it is another problem, this issue will not handle, we can discuss in 
multi-thread related issues.

And if we remove the async dispatcher here? Just keep the eliminating the 
unnecessary back reference to rmDispatcher on RMAppNodeUpdateEvent ?

cc [~pbacsko]  [~ebadger] 

What's your opinion?

Thanks.

> NodeListManager event improvement
> ---------------------------------
>
>                 Key: YARN-9618
>                 URL: https://issues.apache.org/jira/browse/YARN-9618
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Bibin Chundatt
>            Assignee: Qi Zhu
>            Priority: Critical
>         Attachments: YARN-9618.001.patch, YARN-9618.002.patch, 
> YARN-9618.003.patch, YARN-9618.004.patch, YARN-9618.005.patch
>
>
> Current implementation nodelistmanager event blocks async dispacher and can 
> cause RM crash and slowing down event processing.
> # Cluster restart with 1K running apps . Each usable event will create 1K 
> events over all events could be 5k*1k events for 5K cluster
> # Event processing is blocked till new events are added to queue.
> Solution :
> # Add another async Event handler similar to scheduler.
> # Instead of adding events to dispatcher directly call RMApp event handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org

Reply via email to