[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377040#comment-14377040
 ] 

Wangda Tan commented on YARN-2901:
----------------------------------

Hi [~vvasudev],

I spent some time take a look at Log4JMetricsAppeneder implementation (will 
include other modified component in next round).

1) Log4jMetricsAppender, 
1.1 Better to place in yarn-server-common?
1.2 If you agree above, how about put into package o.a.h.y.server.metrics (or 
utils)?
1.3 Rename it to Log4jWarnErrorMetricsAppender?
1.4 Comments about implementation:
I think currently, implementation of cleanup can be improved, now cutoff 
process of message/count is basically loop all items stored, which could be 
inefficient (imaging if number of stored message > threshold), existing logics 
in the patch would lead to lots of potential stored message (tons of messages 
could be genereated in 5 min, which is purge message task run interval).

If you can make the data structure to be:
SortedMap<String, SortedMap<Long, Integer>> errors (and warnings), the outside 
map is sorted by value (SortedMap with smallest timestamp goes first), and 
inside map is sorted by key (smallest timestamp goes first), purge can happen 
when we add any event, it will just take at most log(N=500) time to do the 
purge, and no extra timer task needed.

To make SortedMap can sort by value, one way to do that can refer to 
http://stackoverflow.com/questions/109383/how-to-sort-a-mapkey-value-on-the-values-in-java
 (first answer).

Here, value = SortedMap<Long, Integer>>, we can sort the SortedMaps according 
to smallest key in each SortedMap.

And one corner case may need to consider is, it is possible a same message can 
have lots of different timestamps, so we need purge the inner SortedMap too.

To make better code readability, you can wrap the SortedMap to a inner class 
like MessageInfo.

> Add errors and warning stats to RM, NM web UI
> ---------------------------------------------
>
>                 Key: YARN-2901
>                 URL: https://issues.apache.org/jira/browse/YARN-2901
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to