Hi all

A few questions to provoke discussion/share knowledge better:
* Why does the train run Tue,Wed, Thur rather than Mon,Tue,Wed
* Why do we only have 2 group 1 Wikipedia's (Catalan and Hebrew)
* Should there be a backport window Friday mornings for certain changes?

Longer spiel:

A few weeks ago a change I made led to a small but noticeable UI
regression. The site was perfectly usable, but looked noticeably off. It
was in a more obscure part of the UI so we missed it during QA/code review.

Late Wednesday a ticket was reported against Wikimedia commons, but I only
became aware of it late Thursday when the regression rolled out to English
Wikipedia. A village pump discussion was started and several duplicate
tickets were created. While the site could still be used it didn't look
great and upset the experience of many editors.

Once aware of the problem, the issue was easy to fix. A patch was written
on Friday.

I understand Friday backports are possible, but my team tend to use them as
a last resort in fear of creating more work for my fellow maintainers over
weekend periods. As a result, given the site was still usable, the fix
wasn't backported until the first available backport window on Monday. This
is unfortunately a regular pattern, particularly for small UI regressions.

We addressed the issue on Monday, but I got feedback from several users
that this particular issue took too long to get backported. I mentioned the
no Friday deploy policy. One user asked me why we don't run the train
Monday-Wednesday and to be honest I wasn't sure. I couldn't find anything
on https://wikitech.wikimedia.org/wiki/Deployments/Train.

My team tries to avoid big changes on Mondays as Monday merged patches are
more likely to have issues since they don't always get the time to go
through QA during the week by our dedicated QA engineer.

So... Why don't we run the train Monday-Wednesday? Having a Thursday buffer
during which we can more comfortably backport any issues not caught in
testing, particularly UI bugs would be extremely helpful to my team and I
don't think we'd lose much by losing the Monday to rush last-minute changes.

Assuming there are good reasons for Tuesday-Thursday train, I think there
is another problem with our deploy process which is the size of group 1.
Given the complexity of our interfaces (several skins, gadgets, multiple
special pages, user preferences, gadgets, multiple extensions, and
different user rights), generally, many obscure UI bugs get missed in QA by
people who don't use the software every day and have a clear mental model
of how it looks and behaves. My team mostly works on visible user interface
changes and we rely heavily on Catalan and Hebrew Wikipedia users - our
only group 1 wikis to notice errors with UI before they go out to a wider
audience. Given the size of those audiences, that often doesn't work, and
it's often group 2 wikis that make us aware of issues. If we are going to
keep the existing train of Tue-Thur, I think it's essential we have at
least one larger Wikipedia in our group 1 deploy to give us better
protection against UI regressions living over the weekend. My understanding
is for some reason this is not a decision release engineering can make, but
one that requires an on-wiki RFC by the editors themselves. Is that
correct? While I can understand the reluctance of editors to experience
bugs, I'd argue that it's better to have a bug for a day than to have it
for an entire weekend, and definitely something we need to think more
deeply about.
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to