https://bugzilla.wikimedia.org/show_bug.cgi?id=28493
Krinkle <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] Summary|Create an error reporting |Monitor and index error |service on IRC for |logs for trends and new |Wikimedia |errors --- Comment #9 from Krinkle <[email protected]> --- So, we've got: * Aggregated logs on the servers: https://wikitech.wikimedia.org/wiki/Logs * Gangla and graphite graphing some of these as numerical statistics, but no actual errors or trends. Needs one to open the logs for details. That's fine when working on a major exception spike (regression), but when trying to find minor notices and warnings not affecting everyone we need something else. translatewiki.net has an IRC bot echoing all these error logs, that's too much for us (at the very least we'd need to de-duplicate things). However I think it is should be feasible to develop something that monitors these, detects similar errors (similar to how we group them in fatalmonitor), and only report to IRC when new errors are first seen or errors seen earlier become significantly more common. We need to be careful about what is exposed, but all-in-all a nice web dashboard to show the details and an IRC bot to report trends and new ones could be quite useful. The web dashboard should probably not be written from scratch (perhaps use logstash), if it also has an API to query trends and new ones we can write an irc reporter off of that. This would either need to be run in production (proxied through fenari or whatever we do for things like graphite/gdash these days), or we'd need to replicate the necessary data to a wmflabs instance. -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
