Varun Vasudev updated YARN-2901:
    Attachment: apache-yarn-2901.3.patch

bq.    It's better to use ReentrantReadWriteLock instead of synchronize lock, 
since the class will be more concurrently/frequently read comparing to write.

Actually, in this case, I suspect there will be more writes than reads. The 
reads will occur only when a user loads the webpage whereas the writes will 
occur every time an error or warning is logged. I think the synchronized lock 
should be ok. On a vm on my macbook pro, the appender was able to process 
randomly generated error messages at 150K messages per minute without any 

    getElementsAndCounts, when it trying to get number of count for each 
message, it loops elements in qualifyingTimes. So in extreme case, for 
message="x", there's one message in every second, it will loops all 24 * 60 * 
60 * 2 = 172800 items. Solution of this could be complex, either we don't have 
to remember every second count for some time, such as hard define we only 
remember #count for single message like past 1min, past 5min, past 30min, past 
1h ... to avoid this problem. Or introduce tree-like structure for example 
interval tree (http://en.wikipedia.org/wiki/Interval_tree) to make query more 
efficient. I think remember limited number of time ranges for each message 
should be enough. There seems no need to support cases like "give me error 
count of 2:30:01 am to 3:40:05 pm". If you think changes are manageable, you 
can do it in the patch, or you can file a ticket to address in a separated 

I think I would prefer to support the use case for arbitrary time limits for 
now. The summation occurs only when a user loads the web ui; so the cost 
shouldn't be too bad. I don't think the interval tree makes it any easier and 
makes the implementation much more complex.

bq.    When map.size() > maxUniqueMessages, cleanup will be triggered, I 
suggest to make a buffer that cleanup will not run too offen, such as when 
map.size() > maxUniqueMessages * 2?

Good point; but I think the factor of 2 is too large - it will confuse users 
who set the limit(since the limit will be exceeded often). I've changed it to 
map.size() > maxUniqueMessages * 1.1f

bq. And could you take a look at findbugs warning and failed tests?

The failed tests are unrelated. I've fixed the findbugs warning.

> Add errors and warning stats to RM, NM web UI
> ---------------------------------------------
>                 Key: YARN-2901
>                 URL: https://issues.apache.org/jira/browse/YARN-2901
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).

This message was sent by Atlassian JIRA

Reply via email to