[ 
https://issues.apache.org/jira/browse/YARN-8995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16861741#comment-16861741
 ] 

Tao Yang commented on YARN-8995:
--------------------------------

Thanks [~zhuqi] for updating the patch.
Comments about the new patch:
* For the latest event, I didn't mean that it should be control separately from 
the counter info, we can add a boolean flag defaults to false, which can be 
updated to true when triggering to print the details (for example queue size 
has reached N*5000) and to false after latest event has already been printed.
* Configuration reading logic should be moved to serviceStart() for better 
performance.
* The printEventQueueDetails method can be simplified via stream API, moreover, 
value type of counterMap should use Long instead of long[].
* The new configuration entry should have a clear name, for example 
"yarn.dispatcher.print-events-debug-info.interval-in-thousands" in a random 
think, you can give a better name for it. I suppose we should take thousands as 
the unit since the print switch is due to another condition (qSize % 1000 == 0).

> Log the event type of the too big AsyncDispatcher event queue size, and add 
> the information to the metrics. 
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-8995
>                 URL: https://issues.apache.org/jira/browse/YARN-8995
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: metrics, nodemanager, resourcemanager
>    Affects Versions: 3.2.0
>            Reporter: zhuqi
>            Assignee: zhuqi
>            Priority: Major
>         Attachments: YARN-8995.001.patch, YARN-8995.002.patch
>
>
> In our growing cluster,there are unexpected situations that cause some event 
> queues to block the performance of the cluster, such as the bug of  
> https://issues.apache.org/jira/browse/YARN-5262 . I think it's necessary to 
> log the event type of the too big event queue size, and add the information 
> to the metrics, and the threshold of queue size is a parametor which can be 
> changed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to