[
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14377040#comment-14377040
]
Wangda Tan commented on YARN-2901:
----------------------------------
Hi [~vvasudev],
I spent some time take a look at Log4JMetricsAppeneder implementation (will
include other modified component in next round).
1) Log4jMetricsAppender,
1.1 Better to place in yarn-server-common?
1.2 If you agree above, how about put into package o.a.h.y.server.metrics (or
utils)?
1.3 Rename it to Log4jWarnErrorMetricsAppender?
1.4 Comments about implementation:
I think currently, implementation of cleanup can be improved, now cutoff
process of message/count is basically loop all items stored, which could be
inefficient (imaging if number of stored message > threshold), existing logics
in the patch would lead to lots of potential stored message (tons of messages
could be genereated in 5 min, which is purge message task run interval).
If you can make the data structure to be:
SortedMap<String, SortedMap<Long, Integer>> errors (and warnings), the outside
map is sorted by value (SortedMap with smallest timestamp goes first), and
inside map is sorted by key (smallest timestamp goes first), purge can happen
when we add any event, it will just take at most log(N=500) time to do the
purge, and no extra timer task needed.
To make SortedMap can sort by value, one way to do that can refer to
http://stackoverflow.com/questions/109383/how-to-sort-a-mapkey-value-on-the-values-in-java
(first answer).
Here, value = SortedMap<Long, Integer>>, we can sort the SortedMaps according
to smallest key in each SortedMap.
And one corner case may need to consider is, it is possible a same message can
have lots of different timestamps, so we need purge the inner SortedMap too.
To make better code readability, you can wrap the SortedMap to a inner class
like MessageInfo.
> Add errors and warning stats to RM, NM web UI
> ---------------------------------------------
>
> Key: YARN-2901
> URL: https://issues.apache.org/jira/browse/YARN-2901
> Project: Hadoop YARN
> Issue Type: New Feature
> Components: nodemanager, resourcemanager
> Reporter: Varun Vasudev
> Assignee: Varun Vasudev
> Attachments: Exception collapsed.png, Exception expanded.jpg, Screen
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch,
> apache-yarn-2901.1.patch
>
>
> It would be really useful to have statistics on the number of errors and
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open
> to suggestions on alternate mechanisms for implementing this).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)