[ 
https://issues.apache.org/jira/browse/YARN-2901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Vasudev updated YARN-2901:
--------------------------------
    Attachment: apache-yarn-2901.3.patch

bq.    It's better to use ReentrantReadWriteLock instead of synchronize lock, 
since the class will be more concurrently/frequently read comparing to write.

Actually, in this case, I suspect there will be more writes than reads. The 
reads will occur only when a user loads the webpage whereas the writes will 
occur every time an error or warning is logged. I think the synchronized lock 
should be ok. On a vm on my macbook pro, the appender was able to process 
randomly generated error messages at 150K messages per minute without any 
trouble.

{quote}
    getElementsAndCounts, when it trying to get number of count for each 
message, it loops elements in qualifyingTimes. So in extreme case, for 
message="x", there's one message in every second, it will loops all 24 * 60 * 
60 * 2 = 172800 items. Solution of this could be complex, either we don't have 
to remember every second count for some time, such as hard define we only 
remember #count for single message like past 1min, past 5min, past 30min, past 
1h ... to avoid this problem. Or introduce tree-like structure for example 
interval tree (http://en.wikipedia.org/wiki/Interval_tree) to make query more 
efficient. I think remember limited number of time ranges for each message 
should be enough. There seems no need to support cases like "give me error 
count of 2:30:01 am to 3:40:05 pm". If you think changes are manageable, you 
can do it in the patch, or you can file a ticket to address in a separated 
patch.
{quote}

I think I would prefer to support the use case for arbitrary time limits for 
now. The summation occurs only when a user loads the web ui; so the cost 
shouldn't be too bad. I don't think the interval tree makes it any easier and 
makes the implementation much more complex.

bq.    When map.size() > maxUniqueMessages, cleanup will be triggered, I 
suggest to make a buffer that cleanup will not run too offen, such as when 
map.size() > maxUniqueMessages * 2?

Good point; but I think the factor of 2 is too large - it will confuse users 
who set the limit(since the limit will be exceeded often). I've changed it to 
map.size() > maxUniqueMessages * 1.1f

bq. And could you take a look at findbugs warning and failed tests?

The failed tests are unrelated. I've fixed the findbugs warning.

> Add errors and warning stats to RM, NM web UI
> ---------------------------------------------
>
>                 Key: YARN-2901
>                 URL: https://issues.apache.org/jira/browse/YARN-2901
>             Project: Hadoop YARN
>          Issue Type: New Feature
>          Components: nodemanager, resourcemanager
>            Reporter: Varun Vasudev
>            Assignee: Varun Vasudev
>         Attachments: Exception collapsed.png, Exception expanded.jpg, Screen 
> Shot 2015-03-19 at 7.40.02 PM.png, apache-yarn-2901.0.patch, 
> apache-yarn-2901.1.patch, apache-yarn-2901.2.patch, apache-yarn-2901.3.patch
>
>
> It would be really useful to have statistics on the number of errors and 
> warnings in the RM and NM web UI. I'm thinking about -
> 1. The number of errors and warnings in the past 5 min/1 hour/12 hours/day
> 2. The top 'n'(20?) most common exceptions in the past 5 min/1 hour/12 
> hours/day
> By errors and warnings I'm referring to the log level.
> I suspect we can probably achieve this by writing a custom appender?(I'm open 
> to suggestions on alternate mechanisms for implementing this).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to