[ 
https://issues.apache.org/jira/browse/YARN-10739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhanqi Cai updated YARN-10739:
------------------------------
    Attachment:     (was: Queue_Details.patch)

> GenericEventHandler.printEventQueueDetails cause RM recovery cost too much 
> time
> -------------------------------------------------------------------------------
>
>                 Key: YARN-10739
>                 URL: https://issues.apache.org/jira/browse/YARN-10739
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.4.0, 3.3.1, 3.2.3
>            Reporter: Zhanqi Cai
>            Priority: Critical
>         Attachments: YARN-10739-001.patch
>
>
> Due to YARN-10642 add GenericEventHandler.printEventQueueDetails on 
> AsyncDispatcher, if the event queue size is too large, the 
> printEventQueueDetails will cost too much time and RM  take long time to 
> process.
> For example:
> If we have 4K nodes on cluster and 4K apps running, if we do switch and the 
> nodemanger will register with RM, and RM will call NodesListManager to do 
> RMAppNodeUpdateEvent, code like below:
> for(RMApp app : rmContext.getRMApps().values()) {
>  if (!app.isAppFinalStateStored()) {
>  this.rmContext
>  .getDispatcher()
>  .getEventHandler()
>  .handle(
>  new RMAppNodeUpdateEvent(app.getApplicationId(), eventNode,
>  appNodeUpdateType));
>  }
> So the total event is 4k*4k=1600W, during this window, the 
> GenericEventHandler.printEventQueueDetails will print the event queue detail 
> and be called frequently, once the event queue size reach to 100W+, the 
> Iterator of queue from printEventQueueDetails will be so slow refer to below:
> private void printEventQueueDetails() {
>  Iterator<Event> iterator = eventQueue.iterator();
>  Map<Enum, Long> counterMap = new HashMap<>();
>  while (iterator.hasNext()) {
>  Enum eventType = iterator.next().getType();
> Then RM recovery will cost too much time.....



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to