Wangda Tan commented on YARN-2901:

bq. On a vm on my macbook pro, the appender was able to process randomly 
generated error messages at 150K messages per minute without any trouble.
Sounds good to me!

bq. I think I would prefer to support the use case for arbitrary time limits 
for now. The summation occurs only when a user loads the web ui; so the cost 
shouldn't be too bad. I don't think the interval tree makes it any easier and 
makes the implementation much more complex.
Yeah, I don't like use too complex data structure before really needed. I agree 
what you said, let's keep it as-is and change it if really required.

bq. Good point; but I think the factor of 2 is too large - it will confuse 
users who set the limit(since the limit will be exceeded often). I've changed 
it to map.size() > maxUniqueMessages * 1.1f
I realized if we set clean-up-threshold > maxUniqueMessages, user can see it, 
how about doing clean-up in two conditions:
1) User get message, and #message > maxUniqueMessages
2) #messages > message-threshold, we can set the message-threshold to higher to 
avoid too frequent cleanup.
Sounds good?

bq. I'd prefer not to move it. All the common web ui classes(for the existing 
web ui) are in hadoop-yarn-common and I'll have to move everything over to 
I just tried to move that, it seems no more issues happen, could you check that?

> Add errors and warning stats to RM, NM web UI
> ---------------------------------------------
>                 Key: YARN-2901
>                 URL: https://issues.apache.org/jira/browse/YARN-2901
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch, 
> apache-yarn-2901.4.patch
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).

This message was sent by Atlassian JIRA

Reply via email to