Re: [Wikitech-l] 📈 Wikimedia production errors help

Ed Sanders Tue, 22 Sep 2020 10:00:03 -0700

Speaking specifically about the new JavaScript error logging, and
specifically to Alex's point about triaging these tasks, it would be very
helpful if the reports included some indication of how often the error is
occurring.


For example, VisualEditor is loaded several hundred thousands times per
day. If an error has occurred 4 times in the last 30 days (based on a
recent example) then it is probably very low priority.

On Thu, 17 Sep 2020 at 16:40, C. Scott Ananian <canan...@wikimedia.org>
wrote:

> ACN -- for what it's worth, I've been working for the foundation for a
> while now, and I can report from the inside that the trend is definitely in
> a positive direction.  There is a lot more internal focus on addressing
> code debt and giving maintenance a fair spot at the table.  (In fact, my
> entire team is now sitting inside 'maintenance' now, apparently; we used to
> be 'platform evolution'.)  This email thread is one visible aspect of that
> focus on code quality, not just features.
>
> That said, the one aspect which hasn't improved much in my time at the
> foundation has been the tendency of teams to work in silos.  This thread
> also seems to be a symptom of that: a bunch of production issues are being
> dropped on the floor ('not resolved in over a month') because they are
> falling between the silos and nobody knows who is best able to fix them.
> There are knowledge/expertise gaps among the silos as well: someone
> qualified to fix a DB issue might be at sea trying to track down a front
> end bug, and vice-versa---a number of generalists in the org could
> technically tackle a bug no matter where it lies, but it will take them
> much longer to grok an unfamiliar codebase than it would for someone more
> familiar with that silo.  So bug triage is an increasingly technical task
> in its own right.
>
> This thread, as I read it sitting inside the org, isn't so much asking for
> more attention to be paid to maintenance -- we're winning that battle,
> internally -- as it is a plea for those folks on the edges of their silos
> to keep an eye out for these things which are currently falling between
> them and help with the triage.
>   --scott, speaking only for myself and my view here
>
>
>
> On Wed, Sep 16, 2020 at 11:25 PM AntiCompositeNumber <
> anticompositenum...@gmail.com> wrote:
>
> > There is an impression among many community members, myself included,
> > that Foundation development generally prioritizes new features over
> > fixing existing problems. Foundation teams will sprint for a few
> > months to put together a minimum viable product, release it, then move
> > on to the new hotness, leaving user requests, bugfixes, and the like
> > behind. It often seems that the only way to get a bug fixed is to get
> > a volunteer developer to look at it. This is likely unintentional, but
> > it happens nonetheless.
> >
> > Putting a higher priority within the Foundation on cleaning up old
> > toys before taking out new ones is necessary for the long-term
> > stability of the projects.
> >
> > ACN
> >
> > On Wed, Sep 16, 2020 at 9:05 PM Dan Andreescu <dandree...@wikimedia.org>
> > wrote:
> > >
> > > >
> > > > For example, of the 30 odd backend errors reported in June, 14 were
> > still
> > > > open a month later in July [1], and 12 were still open – three months
> > later
> > > > – in September. The majority of these haven't even yet been triaged,
> > > > assigned assigned or otherwise acknowledged. And meanwhile we've got
> > more
> > > > (non-JavaScript) stuff from July, August and September adding
> > pressure. We
> > > > have to do better.
> > > >
> > > > -- Timo
> > > >
> > >
> > > This feels like it needs some higher level coordination.  Like perhaps
> > > managers getting together and deciding production issues are a priority
> > and
> > > diverting resources dynamically to address them.  Building an awesome
> new
> > > feature will have a lot less impact if the users are hurting from
> growing
> > > disrepair.  It seems to me like if individual contributors and
> > maintainers
> > > could have solved this problem, they would have by now.  I'm a little
> > > worried that the only viable solution right now seems like heroes
> > stepping
> > > up to fix these bugs.
> > >
> > > Concretely, I think expanding something like the Core Platform Team's
> > > clinic duty might work.  Does anyone have a very rough idea of the time
> > it
> > > would take to tackle 293 (wow we went up by a dozen since this thread
> > > started) tasks?
> > > _______________________________________________
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> > _______________________________________________
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
> --
> (http://cscott.net)
> _______________________________________________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] 📈 Wikimedia production errors help

Reply via email to