I've mentioned this briefly to Arthur, now that we have a Team Practices Group I think we should (at least) let them know when we indentify something needing a cultural change, especially if we're going to try to effect such a cultural change. This is apropos of https://phabricator.wikimedia.org/T89049
On Tue, Feb 10, 2015 at 8:27 PM, Ori Livneh <[email protected]> wrote: > > > On Tue, Feb 10, 2015 at 3:44 PM, Nuria Ruiz <[email protected]> wrote: > >> >One possibility is channeling the errors to Sentry which has >> Phabricator integration. The ticket for doing that in beta >is >> https://phabricator.wikimedia.org/T85239, I'm hoping to be able to work >> on it within a few weeks. >> Sounds real good, I think grouping errors in sentry and starting >> assigning the biggest offenders (might not be the most dangerous but the >> ones that pollute the log the most) via phabricator tasks will be a step >> towards the cultural shift Antoine was talking about. >> >> On Tue, Feb 10, 2015 at 2:11 PM, Gergo Tisza <[email protected]> >> wrote: >> >>> On Tue, Feb 10, 2015 at 1:27 PM, Antoine Musso <[email protected]> >>> wrote: >>> >>>> A feature request for the audience: the ability in log stash to >>>> associate a message fingerprint with a Phabricator task. This way we >>>> could filter out triaged messages and focus on new comers. >>>> >>> >>> One possibility is channeling the errors to Sentry which has Phabricator >>> integration. The ticket for doing that in beta is >>> https://phabricator.wikimedia.org/T85239, I'm hoping to be able to work >>> on it within a few weeks. >>> >>> Also, it would be nice if the system would take a guess at who caused >>> the error and alert them directly. Squash for example can git blame the >>> stack trace and find the most recent change: http://squash.io/ >>> >> > Ok, but the absence of these conveniences is not a blocker to getting this > daily routine set up. Chad and Antoine know MediaWiki's logging > infrastructure better than most. > > I agree with Antoine that responsibly for monitoring failures should be > distributed, but I also note that our attempts to tackle this problem > collectively have failed. There has to be a cultural change, yes, but a > specific party has to own this and be accountable. If you come to feel that > deployers are exploiting you by neglecting to monitor the changes they push > out, as the Release Engineering team, you have ways to respond. > > > _______________________________________________ > Engineering mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/engineering > >
_______________________________________________ teampractices mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/teampractices
