[Wikitech-l] Re: Drop dedicated support for IE 11, Edge Legacy and Opera

2024-05-21 Thread Krinkle
On Tue, 14 May 2024, at 18:42, Krinkle wrote:
> I suggest we fix https://phabricator.wikimedia.org/T342267 before making any 
> decision that is based on browser usage […].
> 
> By "fix" I mean, ask your respective managers to [create] demand for it and 
> create interest in it. [it] has thus far not been prioritised. […]

Correction: The task is part of a sprint as of two weeks ago. I failed to see 
this earlier. Yay!

-- Timo
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Drop dedicated support for IE 11, Edge Legacy and Opera

2024-05-14 Thread Krinkle
I suggest we fix https://phabricator.wikimedia.org/T342267 before making any 
decision that is based on browser usage metric, unless we can demonstrate by 
other means that Opera isn't used by up to 10% of page views.

By "fix" I mean, ask your respective managers to demand it and create interest 
in it. I have regularly raised it internally since 2018, but investigating an 
apparent defect whereby over 10% of global data may be missing, has thus far 
not been prioritised.

I have no opinion on IE11/Android as those are obviously low on usage without 
requiring evidence. Given that Opera is Chromium-based, and evergreen, and 
already listed as "Last N years", I would be skeptical of dropping that solely 
based on usage data.

Is there a similar gain in CSS/HTML baseline by dropping such entry?

I note that HTML summary/details do not require native support in Basic to 
adopt. They were specced by WHATWG specifically with progressive enhancement in 
mind. Browsers render content in unknown elements by default (at least, since 
IE6).

-- Timo

On Tue, 14 May 2024, at 00:17, Volker E. wrote:
> Hi everyone,
> the Design System Team (DST) is proposing the following changes to MediaWiki 
> browser support [1]:
> - Drop support for Internet Explorer 11 (IE 11)
> - Drop support for all versions of Edge Legacy
> - Drop support for Opera
> - Increase Basic (Grade C) support for Chrome and Firefox to versions 49+, 
> Safari and iOS to versions 10+. 
> 
> What this means: The browsers we’re phasing out won’t be tested for layout 
> rendering anymore. While users on these browsers might and will still be able 
> to read and basically interact with content, they might experience some 
> quirks. This step helps us integrate modern web features more seamlessly.
> 
> These changes will unlock the ability to use specific newer browser features 
> that cannot be safely used today without requiring a fallback, notably CSS 
> custom properties (used in upcoming reading customization features like Night 
> Mode) and the  and  HTML elements that can be used to 
> replace the checkbox hack.
> This will reduce the amount of code sent to 99.9% of users and cut down on 
> software development costs and maintenance burdens.
> 
> See the full announcement for more details; PDF to download [2].
> 
> On behalf of the Wikimedia Foundation Design System Team,
> Volker
> [1] https://www.mediawiki.org/wiki/Compatibility#Browser_support_matrix
> [2] https://phabricator.wikimedia.org/F52025988
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Collect telemetry (using WikimediaEvents) for a gadget

2023-11-30 Thread Krinkle
I would suggest using the Statsd counters that WikimediaEvents exposes to 
MediaWiki JavaScript (including Gadgets and user scripts!). This is a public 
API, with aggregate data publicly accessible via Grafana.

These require no server-side configurations, schemas, or private data access. 
And (on the flipside) also do not record any personal information.

To use it, call mw.track( counter.gadget_. ) in your 
gadget.

For example:

mw.track( 'counter.gadget_VariantAlly.storage_empty_dialog' );

To make visualising easier, I've put together a generic dashboard to plot these:
https://grafana.wikimedia.org/d/00037/gadget-stats

--
Timo Tijhof
https://timotijhof.net/




On Mon, 27 Nov 2023, at 09:33, psnbaotg via Wikitech-l wrote:
> 
> Hi there,
> 
> I'm User:Diskdance, and recently I'm developing a default gadget for Chinese 
> Wikipedia enhancing MediaWiki's variant handling logic, and under certain 
> circumstances a prompt is shown at page load asking for a user's preferred 
> variant. Consider it as a conditional Cookie notice, and its English 
> screenshot can be found at 
> https://commons.wikimedia.org/wiki/File:VariantAlly-En.png.
> 
> I *know *this can be very disruptive on UX, so I tend to be careful about its 
> negative impact on page views. If the gadget can collect telemetry data about 
> the prompt's display frequency and user interactions (using e.g. 
> WikimediaEvents), I can know about its possible impact.
> 
> Is this possible? It would be much appreciated if anybody could provide 
> assistance.
> 
> Best wishes,
> Diskdance
> 
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Global Lua modules and templates

2023-08-29 Thread Krinkle
I think this is one of the most awesome things I've seen in a long time.
Has my full support! Thank you for building this and sharing it with everyone!

I've been asked whether we should be concerned about this gadget in terms of 
site load or site performance. I'll share my analysis here also for Felipe's 
benefit and for transparency to others.

*== Performance ==*
**
Firstly, I'll look at potential database and web server load. What is the 
exposure of this feature? How many queries does it make for a typical 
interaction? Are those queries expensive? Are they well-cached within 
MediaWiki? Are they cachable at the HTTP level (e.g CDN/Varnish cache)?

Secondly, I'll look at how it makes edits. Does it always attribute edits to 
the actor? What security risk might there be? Does it act on stale data? Does 
it prevent conflicts under race conditions? Does it handle rate limits?

*== Server load ==*

The gadget looks well-written and well-guarded. It was easy to figure out what 
it does and how it does it. For a typical interaction, the gadget initially 
makes ~100 requests to the Action API, which one be seen in the browser 
DevTools under the Network tab. It uses the *MediaWiki* *Action API* to fetch 
page metadata and raw page content.

There's a couple of things Synchronizer does that I like:

* The user interface requires specific input in the form of a Wikidata entity 
and a wiki ID. This means upon casually stumbling on the feature, it won't do 
anything until and unless the person has understood what needs to be entered 
and enters that. A good example is shown through placeholder text, but you have 
to actually enter something before continuing.
* Once submitted, before anything else, Synchronizer checks whether the client 
is logged-in and thus likely to be able to later make edits through this 
gadget. If you're not logged-in, it won't make that burst of 100 API requests. 
Even though a logged-out user could successfully make those API requests to 
render the initial overview, this avoids significant load that would only leave 
the user stranded later on. This removes the vast majority of risk right here. 
In the event of viral linking or misunderstanding of what the feature is for, 
it would have little to no impact.
* When gathering data, it uses *Wikidata* to discover which wikis are known to 
have a version of what the same template. This greatly limits the fan-out to 
"only" 10-100 wikis as opposed to ~1000 wikis. It also makes sense from a 
correctness standpoint as templates don't have the same name on all wikis, and 
those that do share a name may not be compatible or otherwise conceptually the 
same. Wikidata is awesome for this.

Synchronizer uses the Action API to fetch page metadata and raw page content. 
This kind of request is fairly cheap and benefits from various forms of caching 
within MediaWiki. These API modules don't currently offer CDN caching, but, I 
don't think that's warranted today given they're fast enough. If this feature 
were accessible to logged-out users and if it made these queries directly when 
merely visiting a URL, then we'd need to think about HTTP caching to ensure 
that any spikes in traffic can be absorbed by our edge cache data centres.

There is one improvement I can suggest, which is to limit the number of 
concurrent requests. It works fine today but it does technically violate 
*MediaWiki API Etiquette* https://www.mediawiki.org/wiki/API:Etiquette, and may 
get caught in API throttling. To mitigate this, you could tweak the for-loop in 
"getMasterData". This currently starts each `updateStatus` call in parallel to 
each other. If updateStatus returned a Promise, then this loop could instead 
chain together the calls in a linear sequence. For example, `sequence = 
Promise.resolve(); for (…) { sequence = sequence.finally( updateStatus.bind( 
null, module ) ); }`.

*== Editing ==*

For editing, Synchronizer uses all the built-in affordances correctly. E.g. 
mw.Api and mw.ForeignApi 
 as exposed 
by the *mediawiki.api ResourceLoader module*. This gives the gadget a stable 
surface, greatly reduces complexity of the implementation, and automatically 
does all the right things.

I really like that the gadget goes the extra mile of cleverly figuring out why 
a local template that differs from the central one. For example, does the local 
copy match one of the previous versions of the central template? Or was it 
locally forked in a new direction? It then also allows you to preview a diff 
before syncing any given wiki. This is really powerful and empowers people to 
understand their actions before doing it.

To show what could happen if someone didn't use the provided mw.Api JS utility 
and naively made requests directly to the API endpoint:
* edit() takes care of preventing unresolved *edit conflicts*. It uses the Edit 
API's `basetimestamp` parameter, so that if someone else edited the 

[Wikitech-l]  Fresh 23.08.1 released!

2023-08-10 Thread Krinkle
TLDR: fresh-node now uses Node 18. To learn more or install/upgrade, refer to 
https://gerrit.wikimedia.org/g/fresh.

Hi all,

Fresh 23.08.1 is upon is. It's a fairly big release!

*What's new?*

After a delay of more than two years, Node.js 18 and npm 9 (replacing Node.js 
16 and npm 7) are available to WMF CI as of yesterday, and can now be used 
locally through Fresh! We've also added early support for Node.js 20, you can 
opt-in via the fresh-node20 command.

The default fresh-node command was updated to Node.js 18. You can continue to 
use older versions for another 6-12 months via the fresh-node16 and 
fresh-node14 commands. The fresh-node12 command has been removed (unsupported 
since last year ).

We've also made a few minor tweaks to improve support for Podman 
 to encourage competition and use of freely-licensed 
software (Docker for Mac/Windows require the proprietary Docker Desktop). We've 
also improved experimental support for Apple ARM-based devices. Node, npm, and 
Firefox work out of the box using Docker's default emulation (including the 
faster Rosetta-based emulation). For example, using `npx grunt karma:firefox`. 
Chrome has yet to support emulation within Docker.

Full changelog: https://gerrit.wikimedia.org/g/fresh/+/23.08.1/CHANGELOG.md
Commits: https://gerrit.wikimedia.org/g/fresh/+log/23.08.1/

*What was the hold up?*

The updating and creation of new Docker images for WMF CI is rather simple — 
involving little more than a directory copy or a one-line change (example 1 
,
 example 2,  
example 3 ). We 
run a very efficient operation here, with people like James Forrester, Antoine 
Musso, and myself routinely migrating the entire Wikimedia movement's CI 
workloads in mere minutes! We use Zuul to automatically churn through hundreds 
of Jenkins jobs, used on a daily basis by thousands of Gerrit's Git 
repositories. It's quite the feat of automation 
 really.

The holdup was in upgrading the browser tests of several product features to 
Webdriver.io v7. (Browser tests are sometimes associated with "selenium" for 
hysterical raisins, they involve no Selenium tech today.)

Webdriver 6 and earlier use the deprecated Node.js Fibers 
 functionality to emulate async code as 
synchronous code. This allowed our developers to write test cases using a 
simpler syntax in which async-await statements could be omitted. The underlying 
functionality for this was discontinued in Node.js 16. Thus until a team 
migrated their tests to Webdriver 7, the associated CI pipeline would have to 
remain on Node.js 14.

One of the strengths of our WMF CI setup is that projects can bi-directionally 
integrate. This is one of many capabilities that Zuul and a number of other 
large CI systems have provided for over a decade, that e.g. GitHub/GitLab 
(still in their infancy around CI) are only just starting to explore. Zuul 
allows MediaWiki core to ensure its changes can't break extensions, and 
likewise allows each extension to ensure it can't break other extensions that 
it is meant to work together with, e.g. the MediaWiki bundle, or WMF 
production. We currently run this on a single Docker container, which supports 
local testing without a new and different dev environment for each of the 1000+ 
extensions. This is efficient from a big picture perspective, especially as a 
non-profit (though big corporations seem to make the same choice). We've also 
designed our infastructure to generally support two or three major versions 
simultanously, which allows individual teams to schedule maintenance at their 
own convenience. 

But, all this relies on the assumption that teams are aligned, that products in 
production have an owner, and that owners discover and (eventually) prioritise 
routine maintenance within 3-6 months.

Those assumptions allow for wide margins, they we don't meet those anymore, 
with dozens of major capabilities and products having lapsed 
 in ownership over the 
years (or with existing ownership insufficiently excercised). In addition, the 
CI infrastructure lacks a directly responsible individual. Thus team's upgrade 
tasks  can linger for years, with 
nobody accountable for executing a chosen set of high-level priorities or 
knowing the needs of individual MediaWiki development teams. Everybody is 
waiting for somebody else while we continue to progress within our local maxima.

Our support window covered Node 12, 14 and 16. The volunteers maintaining the 
CI jobs (James and myself in this 

[Wikitech-l]  Flame graphs arrive in WikimediaDebug!

2023-06-12 Thread Krinkle
TLDR: The new "Excimer UI" option in WikimediaDebug generates flame graphs 
on-demand! Open this example 
 to browse 
a profile I captured earlier from Wikipedia's Main Page.

To learn what this feature is, why we built it, and when you might use it; read 
this week's post on the Techblog:
https://techblog.wikimedia.org/2023/06/08/flame-graphs-arrive-in-wikimediadebug/
 

Quick start: Capture your own profile

If you haven't already, install WikimediaDebug via Firefox Browser Add-ons 
 or 
Chrome Web Store. 


1. Navigate to an article on Wikipedia.
2. Set the widget to "On" with the "Excimer UI" option checked.
3. Reload the page.

A profile link is now appended to the WikimediaDebug popup. Click it!

Excimer can instrument pageviews, edits, search suggestions from the MediaWiki 
API, JavaScript loading (ResourceLoader load.php), and anything else served by 
MediaWiki from a WMF domain name—including Commons, Wikidata, and mediawiki.org.

--
Timo Tijhof,
Principal Engineer,
Performance Team,
Wikimedia Foundation.
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re-introducing MediaWiki CLI profiler for maintenance scripts

2023-05-17 Thread Krinkle
TLDR: Those with shell access at WMF can now run maintenance scripts on mwdebug 
hosts, and use the *--profiler=text* option to produce a report detailing how 
long the script spent in each MediaWiki component, class, and function.

== *What* ==

The MediaWiki platform at WMF consists of broadly four different deployments: 
app servers, api_app servers, job runners, and maint servers (diagram 
).
 Each can be thought of as its own service for a specific purpose, composed of 
a subset of components from the MediaWiki codebase with fast local access to 
each. The largest of these is the appservers cluster, which is served by 150 
servers of dedicated hardware across the two application data centers 
, and is responsible for 
responding to index.php (i.e. page views) and load.php (CSS, JS, and 
localisation via ResourceLoader).

Today, we focus on the smallest of these: mwmaint servers 
. This is backed by two 
heavy-duty servers, one in each data center, that autonomously run essential 
tasks at a predefined schedule (i.e. not in direct response to a user action). 
Each of these ~50 different tasks is implemented as a MediaWiki maintenance 
script. Important examples include: sending email notifications (Echo 
extension), timely pruning of sensitive PII (CheckUser extension), computing 
mentee and link recommendation data (GrowthExperiments), and reclaiming disk 
space for expired caches (core/ParserCache).

== *Why ==*

We have detailed debug performance profiling in production for web requests via 
the WikimediaDebug extension, and we have detailed profiling in local 
development for both web requests and maintenance scripts (Docker recipe 
).

What was missing is a way to profile maintenance scripts in production. This is 
important as maintenance scripts tend take many minutes or hours to process 
vast amounts of production data. While generally easy to debug locally for 
functional analysis, the performance bottlenecks individual teams care about 
are likely specific to the size of the data and the performance of other 
production components.

Thanks also to Ahmon Dancy (RelEng), Giuseppe Lavagetto (SRE), and Aaron Schulz 
(Performance Team) for making this work possible, and Niklas Laxström (LangEng) 
for coming up with the idea.

== *What's New* ==

Documentation: 
https://wikitech.wikimedia.org/wiki/WikimediaDebug#Plaintext_CLI_profile

To profile a Maintenance script, run the script from the shell with *mwscript* 
as you normally would, but instead of connecting your terminal to 
mwmaint1002.eqiad.wmnet, connect to one of the *mwdebug* hosts (such as 
mwdebug1001.eqiad.wmnet). Then pass the *--profiler=text* option to generate a 
report with the performance analysis, which will be printed after the task is 
finished. Like so:

> $ mwscript showSiteStats.php --wiki=nlwiki --profiler=text
> Number of articles:  2122688
> Number of users   :  1276507
> 
> 

== *A peak behind the curtain* ==

Read on if you'd like to learn what hurdles we had to overcome for this to 
"simply" work in production, like it did for local development. The journey 
started when Niklas (WMF LangEng) proposed the idea at 
https://phabricator.wikimedia.org/T253547.

*Firstly*, the profiler engine. In 2019 (blogpost 
), after 
we migrated from HHVM to PHP 7, we had to look for a new profiler engine for 
backend performance. We adopted the open source php-tideways package, and this 
has powered our browser-facing profiler since. Naturally, this was already 
installed on the mwdebug servers for that purpose. However, the package, and 
the accompanying *rdtsc* setting, were only set for php-fpm (web server), it 
was not yet enabled for php-cli (command-line).

*Secondly*, the Profiler component in MediaWiki core had gotten out of sync 
with the needs of the Skin and Maintenance components. Over years of 
refactoring and more parts of each component gaining an active owner, the parts 
that lacked an owner eroded and stopped working, including the integration 
between these components. We decided to take active ownership over the 
remaining parts of MediaWiki-Core-Profiler and fix the disconnect. The meta 
work for that included identifying and re-triaging open issues under a new 
#mediawiki-core-profiler 
 tag, 
automating discovery to our team inbox via Phabricator Herald rule, enlisting 
on Maintainers , and 
automatic discovery of changesets to our code review dashboard. 


The "output" 

[Wikitech-l]  Fresh 23.05.1 released!

2023-05-08 Thread Krinkle
TLDR: fresh-node now supports a one-off "command" invocation mode.

Learn more or install:
 https://gerrit.wikimedia.org/g/fresh
Changelog:
 https://gerrit.wikimedia.org/g/fresh/+/23.05.1/CHANGELOG.md

Each of the fresh-node scripts now supports a positional "command" argument, to 
run a single command without launching a shell first. For example: fresh-node 
-- npm install. Thanks *Gergő Tisza* and *Kosta Harlan* for their contributions!

fresh-node16 has been upgraded to include Firefox 102.10.0esr and Chromium 112. 
The same container has been in use in WMF CI for npm tests in most repos since 
12 April 2023. The welcome text saw a make-over this release, featuring a new 
mimalistic look. I hope this will make the environment feel even snappier. By 
consensing this baseline, timely warnings about enabled mount points and 
environment exposure should stand out more. *Before* 
 / *After* 



Fresh is a fast way to launch isolated environments from your terminal. These 
can be used to work more responsibly with 'npm' developer tools such as ESLint, 
QUnit, Grunt, Selenium, and more. Example guide: 
https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing. To report issues 
or browse past and current tasks, check Phabricator at 
https://phabricator.wikimedia.org/tag/fresh/.

--
Timo Tijhof,
Principal Engineer,
Wikimedia Foundation.___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l]  Fresh 22.11 released!

2022-11-27 Thread Krinkle
This release adds additional options to forward environment variables, and 
improves ZSH support.

Get started by installing, updating, or learning more, at:
https://gerrit.wikimedia.org/g/fresh#fresh-environment

Changelog:
https://gerrit.wikimedia.org/g/fresh/+/22.11.1/CHANGELOG.md

Thanks to Peter Hedenskog (@Peter) and Ezekiel Udoh (@EUdoh) for contributing 
to this release.

To report issues or browse past and current tasks, check Phabricator at 
https://phabricator.wikimedia.org/tag/fresh/.

Fresh is a fast way to create isolated environments from your terminal. These 
can be used to work more responsibly with 'npm' developer tools such as ESLint, 
QUnit, Grunt, Selenium, and more. Example guide: 
https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing

--
Timo Tijhof___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l]  Valentín Gutierrez receives Web Perf Hero award!

2022-11-21 Thread Krinkle
I’m happy to share that the second Web Perf Hero award of 2022 goes to Valentín 
Gutierrez!

This award is in recognition of Valentín’s work on the Wikimedia CDN over the 
past three months. In particular, Valentín dove deep into Apache Traffic 
Server.  Valentín improved ATS backend p75 latency by 25%, and reduced ATS p99 
disk read latency by up to 1000X. 
Read more at 
https://techblog.wikimedia.org/2022/11/21/web-perf-hero-valentin-gutierrez/

Read about past recipients at Web Perf Hero award on mediawiki.org:
https://www.mediawiki.org/wiki/Wikimedia_Performance_Team/Web_Perf_Hero_award

-- Timo Tijhof, on behalf of WMF Performance Team.
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Feedback wanted: PHPCS in a static types world

2022-11-15 Thread Krinkle


On Tue, 15 Nov 2022, at 11:41, Daniel Kinzler wrote:
> Am 10.11.2022 um 03:08 schrieb Tim Starling:
>> Clutter, because it's redundant to add a return type declaration when the 
>> return type is already in the doc comment. If we stop requiring doc comments 
>> as you propose, then fine, add a return type declaration to methods with no 
>> doc comment. But if there is a doc comment, an additional return type 
>> declaration just pads out the file for no reason.
> I agree that we shouldn't have redundant doc tags and return type 
> declarations. I would suggest that all methods should have a return type 
> declaration, but should not have a @return doc tag unless there is additional 
> info […]
> 
> 
> 
>> The performance impact is measurable for hot functions. In gerrit 820244 
>>  I removed 
>> parameter type declarations from a private method for a benchmark 
>> improvement of 2%. 
>> 
> This raises an interesting issue, one that has bitten me before: How do we 
> know that a given method is "hot"? Maybe we should establish a @hot or 
> @performance tag to indicate that a given method should be optimized for 
> speed. […]
> 

I think the enforced and automated codesniffer could remain fairly simple: As 
today, the sniff encourages all methods to have parameter and return types 
documented in a way that humans, Phan, and IDEs can understand for static 
analysis to avoid and catch mistakes.

What I propose we change is that instead of enforcing this solely through a 
mandatory doc comment, enforce it by requiring at least one of them to be 
present. Either parameters and returns are typed, or a doc block exists. Both 
may exist, of course.

We've established in this email thread that it can be cluttering (and waste of 
effort) to require repeating of information when doing so adds no value. It is 
also my understanding that Phan and IDEs already understand either and both so 
we don't need them to be aware of which "should" exist.

Is there value in enforcing removal of existing doc blocks after someone has 
written it? This seems to me like potentially a significant time sink with no 
return on that other because we enforced it as a new rule. If we agree there is 
no urgency in removing existing doc blocks or actively blocking CI when someone 
choose to write a doc block, then afaik we do not need new annotations like 
"hot" or "performance" or some other tag to surpress warnings about doc blocks.

I do think it is important to preserve author intent when it comes to 
performance optimisations. However these are by no means limited to this new 
notion of saving native type overhead. There are all sorts of code 
optimisations. I believe we typically document these through an inline comment 
like "Optimization: ..." next to the code in question, in which the need for 
optimisation and sometimes (if non-obvious) how that optimisation is achieved, 
are mentioend. That should suffice I think in preserving the use case and e.g. 
prevent someone from re-introducing typing where it was previously removed for 
perf reasons.

In other words: Codesniffer helps us avoid unknown types (in docblock and/or 
native type), and inline comments remind us about past performance 
optimisations. Do we need more? If so, what is the benefit/usecase for more? 
What do we risk if we don't?

-- Timo
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Zuul status page: redesign and perf

2022-11-03 Thread Krinkle
I've redesigned the Zuul status page at 
https://integration.wikimedia.org/zuul/ (or view demo 
 if its idle/empty).

This was the last of our microsites and productivity tools where we still used 
the Twitter Bootstrap design, rather than Wikimedia Design Style Guide 
. It now fits well with the others, 
such as doc.wikimedia.org, research.wikimedia.org, and 
performance.wikimedia.org.

As a fun excercise, I also optimised it. It now transfers 80% less data and 
loads upto 5X faster (time to visual complete). The details may be of interest 
to others as example of iterative refactoring on legacy code (code linked 
below).

For a visual side-by-side and an overview of the bug fixes, design tweaks, and 
perf improvements; refer to my post at:
https://phabricator.wikimedia.org/T322168#8368581

Source commits: 
https://gerrit.wikimedia.org/r/q/project:integration/docroot+file:zuul+is:merged+owner:Krinkle+before:%25222022-11-03+22:00:00%2522+after:2022-01-01

--
Timo Tijhof___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] [BREAKING CHANGE] Remove AlphabeticPager::getOrderTypeMessages()

2022-10-28 Thread Krinkle
This was introduced in March 2008 with r32228 
 (ad178edb80) for 
use in the CategoryPager of Special:Categories, but almost immediately removed 
again from there, on the same day, in r32259 
 (bdf9431795) 
because it wasn't actually supported.

There has been no other use of it to my knowledge, and has remained 
undocumented and without tests.

I'm proposing removal without deprecation (per Stable interface policy#Removal 
). Note that if 
you did use it outside publicly hosted/indexed repos, you don't risk visible 
breakage and require no migration. Unused method overrides in subclasses are 
silently ignored.

https://gerrit.wikimedia.org/r/c/mediawiki/core/+/850612/ 


-- Timo
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Infrastructure diagrams

2022-10-24 Thread Krinkle
I've done a major update to a number of diagrams on Wikitech.

Usually, I don't mention an update here, but I'm highlighting it now as it's 
been a while since we mentioned them on-list and the community and foundation 
have grown a lot so some of these may be new to you.

Given how much has changed in recent changes, I also included a changelog and a 
link to where in the docs you'd normally discover this diagram on-wiki:

*== 1. File:Wikipedia_webrequest_2022.png 
 
(Updated) ==*

This is a highly simplified diagram, covering the general shape of our stack 
through the example of a typical Wikipedia webrequest.
**
Previous: 
https://upload.wikimedia.org/wikipedia/commons/b/b3/Wikipedia_webrequest_flow_2020.png
New: 
https://upload.wikimedia.org/wikipedia/commons/4/4d/Wikipedia_webrequest_2022.png
Documentation: wikitech:MediaWiki_at_WMF 
 and 
wikitech:Caching_overview 
.
Notable changes:
* Change edge TLS termination ("HTTPS") from ats-tls to HAProxy. I wrote a 
"Caching overview § History 
" section.
* Change appserver TLS from Nginx- to Envoy.
* Add new MainStash DB.
* Include storage ExternalStore DB, ParserCache DB, and Swift media.
* Include services Shellbox, Mathoid, and Kask.

*== 2. File:WMF_infrastructure_2022.png 
 
*(Updated) ==**

This is a continous attempt at an overview of tier-1/user-facing 
infrastructure. It will likely never be complete from all POV, but.. it is more 
accurate and complete than it has been. Thanks to all that contributed by 
entertaining my many questions over the years.

Previous (2016 by Elukey): 
https://upload.wikimedia.org/wikipedia/labs/4/4d/Infrastructure_overview.png
New: 
https://upload.wikimedia.org/wikipedia/commons/4/48/WMF_infrastructure_2022.png
Documentation: wikitech:Wikimedia_infrastructure 
 and 
wikitech:Purged 
Notable changes:
* Add new Drmrs data center in Marseille, France.
* Add new services: purged.go, EventStreams, Thumbor, mcrouter, Envoy, etcd.
* Add new distinction for Multi-DC between primary and secondary data center.
* Change sessionstore from Redis to Kask/Cassandra.
* Change jobqueue from Redis to EventGate/Kafka.
* Include distinct MediaWiki server roles and clusters.
* Include high-level MediaWiki platform components.
* Include example flow for "JobQueue job" and "CDN purge".

*== 3. File:MediaWiki_infrastructure_2022.png 
 
(New) ==*

Similar to WMF Infra diagram, but more abstract around DC and services, and 
more detailed within the platform. Including more core services, and 
recognising extensions as their own layer.
**
New: 
https://upload.wikimedia.org/wikipedia/commons/e/ee/MediaWiki_infrastructure_2022.png
Documentation: wikitech:MediaWiki_at_WMF 


*== 4. File:Wikipedia_Memcached_flow_2022.png 
 
*(Updated)**

Previous: 
https://upload.wikimedia.org/wikipedia/commons/d/db/Wikipedia_Memcached_flow_2020.png
New: 
https://upload.wikimedia.org/wikipedia/commons/4/45/Wikipedia_Memcached_flow_2022.png
Documentation: wikitech:Memcached_for_MediaWiki 

Notable changes:
* Include the three tiers of ParserCache.
* Add WANCache legend to explain different keytypes you may encounter on the 
network.
* Add full name of the mcrouter-with-onhost-tier service for greppability.
* Add new WRStats service (T310662 
). This was part of Multi-DC work 
 
to reduce primary DB writes and (not bi-di replicated) Redis use in 
AbuseFilter. This service also replaces the old "User ping limiter" in core and 
is now able to serve both use cases.
* Remove "on-host: soon" labels. Adopting on-host memc for WANCache was 
considered not worth the added runtime complexity (T264604 
). Note that SRE's work on adding 
10G network links for memcached hosts, and the addition of mcrouter-managed 
gutter pools take care of the general usecase that we were exploring on-host 
for. We kept it for ParserCache however (T244340 
.

*== Edit link ==*

As before, each diagram file page has an "Edit" link in the description that 
takes you directly to the open-source Diagrams.net web app (loading file 
read-only from Google Drive). You can fork by using "Save as" in the web app. 
See also: 

[Wikitech-l]  Fresh 22.09 released!

2022-10-11 Thread Krinkle
This release promotes Node.js 16 to be the default runtime in fresh-node.

Get started by installing, updating, or learning more, at:
https://gerrit.wikimedia.org/g/fresh#fresh-environment

Changelog:
https://gerrit.wikimedia.org/g/fresh/+/22.09.1/CHANGELOG.md

Node.js 16 is now the default environment for the fresh-node command. This 
addresses an issue with Vue.js development using Vite 3, which did not work 
well under Node.js 14. (Thanks Lucas Werkmeister, T314051 
).

Note that most projects still test on Node.js 14 in WMF CI. [1] If you find 
that some of your tests don't yet pass under Node.js 16, fret not! Node.js 14 
and 12 remain available via the bundled fresh-node14 and fresh-node12 commands.

If you encounter problems in Fresh with Node.js 16 or experience other issues, 
let us know on Phabricator at https://phabricator.wikimedia.org/tag/fresh/. 
This is also where you can browse previous tasks.

Fresh is a fast way to create isolated environments from your terminal. These 
can be used to work more responsibly with 'npm' developer tools such as ESLint, 
QUnit, Grunt, Selenium, and more. Example guide: 
https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing

--
Timo Tijhof

[1] https://phabricator.wikimedia.org/T314470___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #46: July & August 2022

2022-09-11 Thread Krinkle
How are we doing in our strive for operational excellence? Read on to find out!

Incidents

7 documented incidents in July, and 4 in August (Incident graphs 
<https://codepen.io/Krinkle/full/wbYMZK>). Read more about past incidents at 
Incident status <https://wikitech.wikimedia.org/wiki/Incident_status> on 
Wikitech.

2022-07-03 shellbox 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-07-03_shellbox_request_spike>
Impact: For 16 minutes, edits and previews for pages with Score musical notes 
were slow or unavailable.

2022-07-10 thumbor 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-07-10_thumbor>
Impact: For several days, Thumbor p75 service response times gradually 
regressed by several seconds.

2022-07-11 FrontendUnavailable cache text 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-07-11_FrontendUnavailable_cache_text>
Impact: For 5 minutes, the MediaWiki API cluster in eqiad responded with higher 
latencies or errors.

2022-07-11 Shellbox and parsoid saturation 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-07-11_Shellbox_and_parsoid_saturation>
Impact: For 13 minutes, the mobileapps service was serving HTTP 503 errors to 
clients.

2022-07-12 codfw A5 power cycle 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-07-12_codfw_A5_powercycle>
Impact: No observed public-facing impact. Internal clean up took some work, 
e.g. for Ganeti VMs.

2022-07-13 eqsin bandwidth 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-07-13_brief_outbound_bandwidth_spike_eqsin>
Impact: For 20 minutes, there was a small increase in error responses for 
thumbnails served from the Eqsin data center (Singapore).

2022-07-20 eqiad network 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-07-20_network_interruption>
Impact: For 10-15 minutes, a portion of wiki traffic from Eqiad-served regions 
was lost (about 1M uncached requests). For ~30 minutes, Phabricator was unable 
to access its database.

2022-08-10 cassandra disk space 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-08-10_cassandra_disk_space>
Impact: During planned downtime, other hosts ran out of space due to 
accumulating logs. No external impact.

2022-08-10 confd all hosts 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-08-10_confd_all_hosts>
Impact: No external impact.

2022-08-16 Beta Cluster 502 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-08-16_Beta_Cluster_502>
Impact: For 7 hours, all Beta Cluster sites were unavailable.

2022-08-16 x2 database replication 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-08-16_x2_databases_replication_breakage>
Impact: For 36 minutes, errors were noticeable for some editors. Saving edits 
was unaffected.


Incident follow-up
Recently completed incident follow-up:

Replace certificate on elastic09 in Beta Cluster 
<https://phabricator.wikimedia.org/T315386>
Brian (bking, WMF Search) noticed during an incident review that an internal 
server used an expired cert and renewed it in accordance with a documented 
process.

Localisation cache must be purged after train deploy 
<https://phabricator.wikimedia.org/T263872>
Tchanders (WMF AHT) filed this in 2020 after a recurring issue with stale 
interface labels. Work led by Ahmon (dancy, WMF RelEng).

Remember to review and schedule Incident Follow-up work 
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator! These 
are preventive measures and tech debt mitigations written down after an 
incident is concluded.
Highlight from the "Oldest incident follow-up" query:

 * T83729 <https://phabricator.wikimedia.org/T83729> Fix monitoring of 
poolcounter service.



Trends

The month of July saw 22 new production errors 
<https://phabricator.wikimedia.org/maniphest/query/XHYmsxx4VNRI/#R> of which 9 
are still open today. In August we encountered 29 new production errors 
<https://phabricator.wikimedia.org/maniphest/query/BnX.PiwEomZt/#R> of which 10 
remain open today and have carried over to September.

Take a look at the workboard and look for tasks that could use your help.
→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/ 

 *Did you know?* To zoom in and find your team's error reports, use the 
appropriate "Filter" link in the sidebar 
<https://phabricator.wikimedia.org/tag/wikimedia-production-error/> of the 
workboard.
For the month-over-month numbers, refer to the spreadsheet data 
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>.

Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving 
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof



 Share or read later via 
https://phabricator.wikimedia.org/phame/post/view/296/ 


___
Wikitech-l mailing list -- wikite

[Wikitech-l] Production Excellence #45: June 2022

2022-07-29 Thread Krinkle
How are we doing in our strive for operational excellence? Read on to find out!

Incidents
There were 6 incidents in June this year. That's double the median of three per 
month, over the past two years (Incident graphs 
<https://codepen.io/Krinkle/full/wbYMZK>).

2022-06-01 cloudelastic 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-06-01_Lost_index_in_cloudelastic>
Impact: For 41 days, Cloudelastic was missing search results about files from 
commons.wikimedia.org.

2022-06-10 overload varnish haproxy 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-06-10_overload_varnish_haproxy>
Impact: For 3 minutes, wiki traffic was disrupted in multiple regions for 
cached and logged-in responses.

2022-06-12 appserver latency 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-06-12_appserver_latency>
Impact: For 30 minutes, wiki backends were intermittently slow or unresponsive, 
affecting a portion of logged-in requests and uncached page views.

2022-06-16 MariaDB password 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-06-16_MariaDB_password_leak>
Impact: For 2 hours, a current production database password was publicly known. 
Other measures ensured that no data could be compromised (e.g. firewalls and 
selective IP grants).

2022-06-21 asw-a2-codfw power 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-06-21_asw-a2-codfw_accidental_power_cycle>
Impact: For 11 minutes, one of the Codfw server racks lost network 
connectivity. Among the affected servers was an LVS host. Another LVS host in 
Codfw automatically took over its load balancing responsibility for wiki 
traffic. During the transition, there was a brief increase in latency for 
regions served by Codfw (Mexico, and parts of US/Canada).

2022-06-30 asw-a4-codfw power 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-06-30_asw-a4-codfw_accidental_power_cycle>
Impact: For 18 minutes, servers in the A4-codfw rack lost network connectivity. 
Little to no external impact.

Incident follow-up
Recently completed incident follow-up:


Audit database usage of GlobalBlocking extension 
<https://phabricator.wikimedia.org/T307648>
Filed by Amir (Ladsgroup) in May following an outage due to db load from 
GlobalBlocking. Amir reduced the extensions' DB load by 10%, through avoiding 
checks for edit traffic from WMCS and Toolforge. And he implemented stats for 
monitoring GlobalBlocking DB queries going forward.


Reduce Lilypond shellouts from VisualEditor 
<https://phabricator.wikimedia.org/T312319>
Filed by Reuven (RLazarus) and Kunal (Legoktm) after a shellbox incident. Ed 
(Esanders) and Sammy (TheresNoTime) improved the Score extension's VisualEditor 
plugin to increase its debounce duration.

Remember to review and schedule Incident Follow-up work 
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator! These 
are preventive measures and tech debt mitigations written down after an 
incident is concluded. Read more about past incidents at Incident status 
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech.

Trends

In June and July (which is almost over), we reported 27 new production errors 
<https://phabricator.wikimedia.org/maniphest/query/WDqlrITVmIoX/#R> and 25 
production errors 
<https://phabricator.wikimedia.org/maniphest/query/pzOAOpbnF3PX/#R> 
respectively. Of these 52 new issues, 27 were closed in weeks since then, and 
25 remain unresolved and will carry over to August.

We also addressed 25 stagnant problems that we carried over from previous 
months, thus the workboard overall remains at exactly 299 unresolved production 
errors.

Take a look at the Wikimedia-production-error 
<https://phabricator.wikimedia.org/tag/wikimedia-production-error/> workboard 
and look for tasks that could use your help.

 *Did you know?* To zoom in and find your team's error reports, use the 
appropriate "Filter" link in the sidebar of the workboard .
For the month-over-month numbers, refer to the spreadsheet data 
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>.

Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving 
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof




 Share or read later via 
https://phabricator.wikimedia.org/phame/post/view/292/ 

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #44: May 2022

2022-06-15 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to 
find out!

Incidents 
By golly, we've had quite the month! 10 documented incidents, which is more 
than three times the two-year median of 3. The last time we experienced ten or 
more incidents in one month, was June 2019 when we had eleven (Incident graphs 
<https://codepen.io/Krinkle/full/wbYMZK>, Excellence monthly of June 2019 
<https://phabricator.wikimedia.org/phame/post/view/163/production_excellence_12_june_2019/>).

I'd like to draw your attention to something positive. As you read the below, 
take note of incidents that did *not* impact public services, and did *not* 
have lasting impact or data loss. For example, the Apache incident 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-24_Failed_Apache_restart>
 benefited from PyBal's automatic health-based depooling. The deployment server 
incident <https://wikitech.wikimedia.org/wiki/Incidents/2022-05-02_deployment> 
recovered without loss thanks to Bacula. The Etcd incident 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-01_etcd> impact was 
limited by serving stale data. And, the Hadoop incident 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-31_Analytics_Data_Lake_-_Hadoop_Namenode_failure>
 recovered by resuming from Kafka right where it left off.

2022-05-01 etcd <https://wikitech.wikimedia.org/wiki/Incidents/2022-05-01_etcd>
Impact: For 2 hours, Conftool could not sync Etcd data between our core data 
centers. Puppet and some other internal services were unavailable or out of 
sync. The issue was isolated, with no impact on public services.

2022-05-02 deployment server 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-02_deployment>
Impact: For 4 hours, we could not update or deploy MediaWiki and other 
services, due to corruption on the active deployment server. No impact on 
public services.

2022-05-05 site outage 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-05_Wikimedia_full_site_outage>
Impact: For 20 minutes, all wikis were unreachable for logged-in users and 
non-cached pages. This was due to a GlobalBlocks schema change causing 
significant slowdown in a frequent database query.

2022-05-09 Codfw confctl 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-09_confctl>
Impact: For 5 minutes, all web traffic routed to Codfw received error 
responses. This affected central USA and South America (local time after 
midnight). The cause was human error and lack of CLI parameter validation.

2022-05-09 exim-bdat-errors 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-09_exim-bdat-errors>
Impact: During five days, about 14,000 incoming emails from Gmail users to 
wikimedia.org were rejected and returned to sender.

2022-05-21 varnish cache busting 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-21_varnish_cache_busting>
Impact: For 2 minutes, all wikis and services behind our CDN were unavailable 
to all users.

2022-05-24 failed Apache restart 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-24_Failed_Apache_restart>
Impact: For 35 minutes, numerous internal services that use Apache on the 
backend were down. This included Kibana (logstash) and Matomo (piwik). For 20 
of those minutes, there was also reduced MediaWiki server capacity, but no 
measurable end-user impact for wiki traffic.

2022-05-25 de.wikipedia.org 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-25_de.wikipedia.org>
Impact: For 6 minutes, a portion of logged-in users and non-cached pages 
experienced a slower response or an error. This was due to increased load on 
one of the databases.

2022-05-26 m1 database hardware 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-26_Database_hardware_failure>
Impact: For 12 minutes, internal services hosted on the m1 database (e.g. 
Etherpad) were unavailable or at reduced capacity.

2022-05-31 Analytics Hadoop failure 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-05-31_Analytics_Data_Lake_-_Hadoop_Namenode_failure>
Impact: For 1 hour, all HDFS writes and reads were failing. After recovery, 
ingestion from Kafka resumed and caught up. No data loss or other lasting 
impact on the Data Lake.

Incident follow-up 
Recently completed incident follow-up:

Invalid confctl selector should either error out or select nothing 
<https://phabricator.wikimedia.org/T308100>
Filed by Amir (@Ladsgroup <https://phabricator.wikimedia.org/p/Ladsgroup/>) 
after the confctl incident this past month. Giuseppe (@Joe 
<https://phabricator.wikimedia.org/p/Joe/>) implemented CLI parameter 
validation to prevent human error from causing a similar outage in the future.

Backup opensearch dashboards data <https://phabricator.wikimedia.org/T237224>
Filed back in 2019 by Filippo (@fgiunchedi 
<https://phabricator.wikimedia.org/p/fgiunchedi/>). The OpenSearch homepage 
dashbo

[Wikitech-l]  Amir Sarabadani receives Web Perf Hero award!

2022-05-26 Thread Krinkle
I'm happy to share that the first Web Perf Hero award of 2022 goes to Amir 
Sarabadani!

This award is in recognition of Amir's work (@Ladsgroup) over the past six 
months, in which he demonstrated deep expertise of the MediaWiki platform and 
significantly sped up the process of saving edits in MediaWiki. This improved 
both the potential of MediaWiki core, and as experienced concretely on WMF 
wikis, especially on Wikidata.org.

Refer to the below medawiki.org page to read about what it took to *cut 
latencies by half*:
https://www.mediawiki.org/wiki/Wikimedia_Performance_Team/Web_Perf_Hero_award#Amir_Sarabadani

This award is given on a quarterly basis, and manifests as a Phabricator badge:
https://phabricator.wikimedia.org/badges/view/17/

-- Timo Tijhof, on behalf of WMF Performance Team.
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #43: April 2022

2022-05-12 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to 
find out!

Incidents 
Last month we experienced 2 (public) incidents. This is below the three-year 
median of 3 incidents a month (Incident graphs 
<https://codepen.io/Krinkle/full/wbYMZK>).

2022-04-06 esams network 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-04-06_esams_network>
Impact: For 30 minutes, wikis were slow or unreachable for a portion of clients 
to the Esams data center. Esams is one of two DCs primarily serving Europe, 
Middle East, and Africa.

2022-04-26 cr2-eqord down 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-04-26_cr2-eqord_down>
Impact: No external impact. Internally, for 2 hours we were unable to access 
our Eqord routers by any means. This was due to a fiber cut on a redundant link 
to Eqiad, which then coincided with planned vendor maintenance on the links to 
Ulsfo and Eqiad. See also Network design 
<https://wikitech.wikimedia.org/wiki/Network_design>.



Incident follow-up 
Remember to review and schedule Incident Follow-up work 
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator, which 
are preventive measures and tech debt mitigations written down after an 
incident is concluded. Read more about past incidents at Incident status 
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech.

Recently resolved incident follow-up:

Reduce mysql grants for wikiadmin scripts 
<https://phabricator.wikimedia.org/T249683>
Filed in 2020 after the wikidata drop-table incident (details 
<https://wikitech.wikimedia.org/wiki/Incidents/2020-04-07_Wikidata%27s_wb_items_per_site_table_dropped>).
 Carried out over the last six months by Ladsgroup (SRE Data Persistence).

Improve reliability of Toolforge k8s cron jobs 
<https://phabricator.wikimedia.org/T308204> and Re-enable CronJobControllerV2 
<https://phabricator.wikimedia.org/T308205>
Filed earlier this week after a Toolforge incident and carried out by Majavah.


Trends 
During the month of April we reported 27 new production errors 
<https://phabricator.wikimedia.org/maniphest/query/OZ99DkeJf85D/#R>. Of these 
new errors, we resolved 14, and the remaining 13 are still open and have 
carried over to May.

Last month, the workboard totalled 298 unresolved error reports. Of these older 
reports that carried over from previous months, 16 were resolved. Most of these 
were reports from before 2019.

The new total, including some tasks for the current month of May, is 292. A 
slight decrease! (spreadsheet 
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>).

Take a look at the workboard and look for tasks that could use your help.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/ 




Thanks! 

Thank you to everyone who helped by reporting, investigating, or resolving 
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof



 Share or read later via https://phabricator.wikimedia.org/phame/post/view/284/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l]  Fresh 22.05 released!

2022-05-10 Thread Krinkle
This release promotes Node.js 14 to be the default for the fresh-node command.

Get started by installing, updating, or learning more, at:
https://gerrit.wikimedia.org/g/fresh#fresh-environment

Changelog:
https://gerrit.wikimedia.org/g/fresh/+/22.05.1/CHANGELOG.md

Node.js 14 is now the default environment for the fresh-node command, this 
follows WMF CI (Thanks James Forrester!). [1] Node.js 12 remains available via 
fresh-node12. Node.js 10 has been removed after being deprecated since 
September of last year.

This release also adds support for detecting and installing to your home 
directory instead of system-wide, which is often preferred on Linux. The 
installer automatically selects one of $HOME/bin or $HOME/.local/bin if it 
exists and is in your "PATH". (Thanks Antoine Musso!)

If you encounter problems with Node.js 14 or experience other issues, let us 
know on Phabricator at https://phabricator.wikimedia.org/tag/fresh/. This is 
also where you can previous tasks.

Fresh is a fast way to create isolated environments from your terminal. These 
can be used to work more responsibly with 'npm' developer tools such as ESLint, 
QUnit, Grunt, Selenium, and more. Example guide: 
https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing

--
Timo Tijhof

[1] https://phabricator.wikimedia.org/T267890___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #42: April 2022

2022-04-21 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to 
find out!

Incidents
We've had quite the month, with 8 documented incidents. That's more than double 
the two-year median of three a month (Incident graphs 
<https://codepen.io/Krinkle/full/wbYMZK>).

2022-03-01 ulsfo network 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-03-01_ulsfo_network>
Impact: For 20 minutes, clients normally routed to Ulsfo were unable to reach 
our projects. This includes New Zealand, parts of Canada, and the United States 
west coast.

2022-03-04 esams availability banner sampling 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-03-04_esams_availability_banner_sampling>
Impact: For 1.5 hours, all wikis were largely unreachable from Europe (via 
Esams), with more limited impact across the globe via other data centers as 
well.

2022-03-06 wdqs-categories 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-03-06_wdqs-categories>
Impact: For 1.5 hours, some requests to the public Wikidata Query Service API 
were sporadically blocked.

2022-03-10 site availability 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-03-10_MediaWiki_availability>
Impact: For 12 min, all wikis were unreachable to logged-in users, and to 
unregistered users trying to access uncached content.

2022-03-27 api <https://wikitech.wikimedia.org/wiki/Incidents/2022-03-27_api>
Impact: For ~4 hours, in three segments of 1-2 hours each over two days, there 
were higher levels of failed or slow MediaWiki API requests.

2022-03-27 wdqs outage 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-03-27_wdqs_outage>
Impact: For 30 minutes, all WDQS queries failed due to an internal deadlock.

2022-03-29 network 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-03-29_network>
Impact: For approximately 5 minutes, Wikipedia and other Wikimedia sites were 
slow or inaccessible for many users, mostly in Europe/Africa/Asia. (Details not 
public at this time.)

2022-03-31 api errors 
<https://wikitech.wikimedia.org/wiki/Incidents/2022-03-31_api_errors>
Impact: For 22 minutes, API server and app server availability were slightly 
decreased (~0.1% errors, all for s7-hosted wikis such as Spanish Wikipedia), 
and the latency of API servers was elevated as well.




Incident follow-up

Remember to review and schedule Incident Follow-up (Sustainability) 
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator, which 
are preventive measures and tech debt mitigations written down after an 
incident is concluded. Read more about past incidents at Incident status 
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech. Some 
recently completed sustainability work:

Add linecard diversity to router-to-router interconnect at Codfw 
<https://phabricator.wikimedia.org/T248506>
Filed by Chris (SRE Infra) in 2020 after an incident where all hosts in the 
Codfw data center lost connectivity at once. Completed by Arzhel and Cathal 
(SRE Infra), and Papaul (DC Ops); including in Esams where the same issue 
existed.

Expand parser tests to cover language conversation variants in 
table-of-contents output <https://phabricator.wikimedia.org/T295187>
Suggested and carried out by CScott (Parsoid) after reviewing an incident in 
November. The TOC on wikis that rely on the LanguageConverter service (such as 
Chinese Wikipedia) were no longer localized

Fix unquoted URL parameters in Icgina health checks 
<https://phabricator.wikimedia.org/T304323>
Suggested by Riccardo (SRE Infra) in response to an early warning signal for 
TLS certificate expiry. He realized that automated checks for a related cluster 
were still claiming to be in good health, when they in fact should have been 
firing a similar warning. Carried out by Filippo and Dzahn.

Provide automation to quickly show replication status when primary is down 
<https://phabricator.wikimedia.org/T281249>
Filed in April by Jaime (SRE Data Persistence), carried out by John and 
Ladsgroup.




Trends

Since the last edition, we resolved 24 of the 301 unresolved errors that 
carried over from previous months.
In March, we created 54 new production errors 
<https://phabricator.wikimedia.org/maniphest/query/ryOkF_JP6cV1/#R>. That's 
quite high compared to the twenty-odd reports we find most months. Of these, 17 
remain open today a month later.

In the month of April, so far, we reported 20 new errors 
<https://phabricator.wikimedia.org/maniphest/query/1LEA6jQzf7iU/#R> of which 
also 17 remain open today.

The production error workboard once again adds up to exactly 298 open tasks 
(spreadsheet 
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>).


Take a look at the workboard and look for tasks that could use your help.
→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/ 


[Wikitech-l] Re: Different cache invalidation rules for similar users?

2022-04-06 Thread Krinkle
On Mon, 4 Apr 2022, at 10:12, Strainu wrote:
> Thank you for your responses folks. The script is a gaget [1], loaded
> and unloaded through the preferences.
> 
> Regards,
>Strainu
> 
> [1] https://ro.wikipedia.org/wiki/MediaWiki:Gadget-wikidata-description.js
> 

This page has a history of two revisions, both 25 Mar, about 10 minutes apart.

Is the reported issue that its last edit [1] was seemingly not applied for some 
editors? E.g. they kept getting the previous version with the getElementByID 
error?

-- Krinkle

[1] 
https://ro.wikipedia.org/w/index.php?title=MediaWiki%3AGadget-wikidata-description.js=revision=14854456=14854443___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Different cache invalidation rules for similar users?

2022-04-03 Thread Krinkle
On Sun, 3 Apr 2022, at 17:57, Strainu wrote:
> Hi,
> 
> I've recently seen some complaints from 2 users located in the same country 
> that it takes about half a day for the Javascript changes to propagate. Users 
> from different countries but similar user rights don't seem to have this 
> problem.
> 
> Is it possible to have different cache invalidation rules for different 
> countries? If not, what else could cause this behavior?

It depends on what kind of changes and to what piece of JavaScript code.

My guess would be that this is a change not to deployed software or gadgets or 
site scripts, but a user script. And that the user script is loaded by URL via 
importScriptURI or mw.loader.load. And that the URL is non-standard (e.g. not 
exactly /w/index.php?title=..=raw=text/javascript, but with other 
parameters or different order or different encoding). This means that it is not 
purged on edits.

In that case, it will stay cached. It might then be that someone near one data 
center is lucky that the URL is not used there before and sees no cache. Or 
that near another data center the URL is not popular enough to stay in the CDN 
and thus falls out before the 7 day expiry despite no observed edit or purge.

To know for sure, I would need to see the specific script edit and how the 
script is loaded.

-- Krinkle
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l]  SD0001 receives Web Perf Hero award!

2022-03-30 Thread Krinkle
I'm happy to share that the next Web Perf Hero award goes to SD0001, in 
recognition of their many contributions and positive impact on the performance 
of Wikimedia software. I'll share two major examples.

SD0001 implemented Package files 
 for Gadgets 
 (T198758 
). This enables gadget maintainers 
to bundle JSON files, unpacked via `require()`. This improves performance by 
avoiding delays from extra web requests. It also improves security by allowing 
safe contributions to JSON pages, as pure data with validated syntax on-edit. 
Previously, admins on Wikimedia wikis for example, would need script editing 
access for this and rely on copy-paste instructions from another person via the 
talk page.

SD0001 also introduced `Module::getSkins` in ResourceLoader, and used it in the 
startup module 
 to 
optimise away unneeded module registrations 
.
 We just shipped the first adoption of this for Gadgets (T236603 
). In the future, we'll use this to 
optimise MediaWiki's own skin modules as well.

This award is given once a quarter, and manifests as a Phabricator badge.

More information and past recipients:
https://www.mediawiki.org/wiki/Wikimedia_Performance_Team/Web_Perf_Hero_award

Phabricator badge:
https://phabricator.wikimedia.org/badges/view/17/

-- Timo Tijhof, on behalf of WMF Performance Team.
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] MediaWiki 1.38 release blockers

2022-03-23 Thread Krinkle
I'd like to draw some attention to the current outstanding release blockers.

>From https://phabricator.wikimedia.org/tag/mw-1.38-release/,
8 tasks as queried from 
https://phabricator.wikimedia.org/maniphest/query/2YXgSfsvECyO/#R

1. T261329 Prepare Parsoid for MW 1.38 (ideally) 

2. T265518 Move Parsoid ServiceWorker.php and extension/src/Config into core 

3. T302117 ZeroConf VE for MW 1.38 
> Component: Parsoid.
> Assigned to: WMF Content Transformation, formerly known as Parsing team.

If this is finished in master, it may be worth checking that relevant docs are 
up-to-date, and that relevant changes have made it to the branch in time (or 
are backported).

4. T275246 Populate rev_actor and rev_comment_id 

> Component: MediaWiki-Revision-backend.
> Assigned to: WMF Platform Engineering (@tstarling).

Originally planned for MW 1.36. Last year, change 684142 
 implemented the 
migration. 

I moved it to the MW 1.39 milestone as this isn't something we can (nor should) 
rush through code changes alone. Per Tim's comment 
, we should add this to the 
installer/updater after we've succesfully migrated WMF's.

5. T288686 WVUI's TypeaheadSearch should work with a non-default 
`$wgScriptPath` 
> Component: Vector 22.
> Assigned to: Readers Web.

The new Vector 22 skin isn't yet used by default, but it currently hardcodes 
WMF-specific configuration and is bundled the release. There is an open patch 
pending review.

6. T294612 Raw HTML from Language Converters' title conversion displayed 

> Component: MediaWiki-Parser, MediaWiki-Language-converter.
> Assigned to: WMF Content Transformation, formerly known as Parsing team.

A regression reported October, that appears unsolved in prod (the linked 
example still displays raw HTML) and currently set to block the release.

7. T295187 Chinese conversion no longer work in the table of content 

> Component: MediaWiki-Parser, MediaWiki-Language-converter.
> Assigned to: WMF Content Transformation (@cscott).

This appears to be resolved from my very rudimentary testing.

8. T303029 Revert ParsoidSiteConfigInit hook creation 

> Component: Parsoid.
> Assigned to: WMF Content Transformation, formerly known as Parsing team.

Best,
-- Timo
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #41: February 2022

2022-03-14 Thread Krinkle

How’d we do in our strive for operational excellence last month? Read on to 
find out!

Incidents

3 documented incidents last month.
2022-02-01 ulsfo network 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2022-02-01_ulsfo_network>

Impact: For 3 minutes, clients served by the ulsfo POP were not able to 
contribute or display un-cached pages.
2022-02-22 wdqs updater codfw 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2022-02-22_wdqs_updater_codfw>

Impact: For 2 hours, WDQS updates failed to be processed. Most bots and tools 
were unable to edit Wikidata during this time.
2022-02-22 vrts 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2022-02-22_vrts>

Impact: For 12 hours, incoming emails to a specific recently created VRTS queue 
were not processed with senders receiving a bounce with an SMTP 550 Error.
See also Incident graphs <https://codepen.io/Krinkle/full/wbYMZK>.

Incident follow-up
Remember to review and schedule Incident Follow-up work 
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator, which 
are preventive measures and tech debt mitigations written down after an 
incident is concluded. Read about past incidents at Incident status 
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech.

Recently conducted incident follow-up:

Create a dashboard for Prometheus metrics about health of Prometheus itself. 
<https://phabricator.wikimedia.org/T222102>

Pitched by CDanis after an April 2019 incident, carried by Filippo 
(@fgiunchedi).
Improve wording around AbuseFilter messages about throttling functionality. 
<https://phabricator.wikimedia.org/T200036>

Originally filed in 2018. This came up last month during an incident where the 
wording may've led to a misunderstanding. Now resolved by @Daimona.
Exclude restart procedure from automated Elasticsearch provisioning. 
<https://phabricator.wikimedia.org/T290902>

There can be too much automation. Filed after an incident last September. Fixed 
by @RKemper.

Outstanding errors

Take a look at the workboard and look for tasks that could use your help.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/
I skip breakdowns most months as each breakdown has its flaws. However, I hear 
people find them useful, so I'll try to do them from time to time with my noted 
caveats. The last breakdown was in the December edition 
<https://phabricator.wikimedia.org/phame/post/view/265/production_excellence_39_december_2021/>,
 which focussed on throughput during a typical month. Important to recognise is 
that neither high nor low throughput is per-se good or bad. It's good when 
issues are detected, reported, and triaged correctly. It's also good if a 
team's components are stable and don't produce any errors. A report may be 
found to be invalid or a duplicate, which is sometimes only determined a few 
weeks later.

The below "after six months" breakdown takes more of that into consideration by 
looking at what's on the table after six months (tasks upto Sept 2021). This 
may be considered "fairer" in some sense, although has the drawback of 
suffering from hindsight bias, and possibly not highlighting current or most 
urgent areas.

WMF Product:

 * Anti Harassment Tools (3): 1 MW Blocks, 2 SecurePoll.
 * Community Tech (0).
 * Design Systems (1): 1 WVUI.
 * Editing Team (15): 14 VisualEditor, 1 OOUI.
 * Growth Team (13): 11 Flow, 1 GrowthExperiments, 1 MW Recent changes.
 * Language Team (6): 4 ContentTranslation, 1 CX-server, 1 Translate extension.
 * Parsoid Team (9): 8 Parsoid, 1 ParserFunctions extension .
 * Product Infrastructure: 2 JsonConfig, 1 Kartographer, 1 WikimediaEvents.
 * Reading Web (0).
 * Structured Data (4): 2 MW Uploading, 1 WikibaseMediaInfo, 1 3D extension.
WMF Tech:

 * Data Engineering: 1 EventLogging.
 * Fundraising Tech: 1 CentralNotice.
 * Performance: 1 Rdbms.
 * Platform MediaWiki Team (19): 4 MW-Page-data, 1 MW-REST-API, 1 
MW-Action-API, 1 MW-Snapshots, 1 MW-ContentHandler, 1 MW-JobQueue, 1 
MW-libs-RequestTimeout, 9 Other.
 * Search Platform: 1 MW-Seach.
 * SRE Service Operations: 1 Other.
WMDE:

 * WMDE-Wikidata (7): 5 Wikibase, 2 Lexeme.
 * WMDE-TechWish: 1 FileImporter.
Other:

 * Missing steward (7): 2 Graph, 2 LiquidThreads, 2 TimedMediaHandler, MW 
Contributions 1 page.
 * Individually maintained (2): 1 WikimediaIncubator, 1 Score extension.
Trends
In February, we reported 25 new production errors 
<https://phabricator.wikimedia.org/maniphest/query/1B79KZ8KkRj6/#R>. Of those, 
13 have since been resolved, and 12 remain open as of today (two weeks into the 
following month). We also resolved 22 errors that remained open from previous 
months. The overall workboard has grown slightly to a total of 301 outstanding 
error reports.

For the month-over-month graph, refer to the spreadsheet. 
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroY

[Wikitech-l] Re: Best practices for extensions

2022-03-14 Thread Krinkle
Thanks everyone for sharing thoughts here and on the talk page 
<https://www.mediawiki.org/wiki/Talk:Best_practices_for_extensions>.

A number of clarifications have been made, and some unready/outdated sections 
have for the time being been removed, shortened or replaced with a non-nominal 
reference to a different page (such as Accessibility).

To the best of my knowledge, the remaining points of this best practices guide 
are now reflective of the practices that most MediaWiki extension maintainers 
have been practicing in recent years (both in WMF-deployed extensions and many 
third-party extensions alike). As such, I've marked it as a developer guideline.

There are open discussion topics on the talk page 
<https://www.mediawiki.org/wiki/Talk:Best_practices_for_extensions> about more 
practices to add, including a topic about Accessibility guidelines 
<https://www.mediawiki.org/wiki/Topic:Wqvqvhgsvpu1je15> (Do we re-incorporate 
some of it? And how? How much do we duplicate? If not, what should we do 
instead?)

-- Timo

On Thu, 27 Jan 2022, at 04:43, Krinkle wrote:
> Hi all,
> 
> You may be familiar with the Best practices for extensions 
> <https://www.mediawiki.org/wiki/Best_practices_for_extensions> page on 
> mediawiki.org. It has been marked as a draft since 2017.
> 
> I'd like to polish this page and get it to a state where it would be 
> uncontroversial to label it as "Development guideline 
> <https://www.mediawiki.org/wiki/Development_guidelines>". This would not make 
> it a hard policy. Neither does it imply that it covers all practices in all 
> situations.
> 
> Rather, it would mean that the items that are there now are indeed a part of 
> our current best practices. We would keep it alive through bold 
> <https://en.wikipedia.org/wiki/Wikipedia:Be_bold> edits and talk page 
> conversations, similar to our Coding conventions 
> <https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP> and other such 
> guidelines that we maintain peer to peer and through consensus.
> 
> The reason I've not simply labelled it as such already is because before 
> today I found the document to be out of sync with our actual practices. I 
> have made a number of changes with descriptive edit summaries to bring it in 
> sync with what I percieve to be our best practices; based on how myself and 
> other maintainers perform code review at large, and how we review new 
> extensions prior to deployment.
> 
> All are welcome to fix mistakes, raise questions/concerns on the talk page, 
> on this thread. You're also welcome to message me directly anytime if you 
> prefer.
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l]  Fresh 22.01 released!

2022-02-15 Thread Krinkle
This release adds the npx command, and updates Firefox (78esr to 90esr) and 
Chromium (90 to 97).

Get started by installing, updating, or learning more, at:
https://gerrit.wikimedia.org/g/fresh#fresh-environment

Changelog:
https://gerrit.wikimedia.org/g/fresh/+/22.01.1/CHANGELOG.md

As a reminder, the previous release changed the default "fresh-node" command to 
Node.js 12, and introduced fresh-node14 to use Node.js 14. The older Node.js 10 
environment. If you encounter problems with Node.js 12, let us know on 
Phabricator at https://phabricator.wikimedia.org/tag/fresh/. This is also where 
you can find recently resolved tasks.

Fresh is a fast way to create isolated environments from your terminal. These 
can be used to work more responsibly with 'npm' developer tools such as ESLint, 
QUnit, Grunt, Selenium, and more. Example guide: 
https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing

--
Timo Tijhof___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Best practices for extensions

2022-02-15 Thread Krinkle
Hi Ostrzyciel. Replies inline.

On Thu, 27 Jan 2022, at 08:15, Ostrzyciel wrote:
> Maybe a dumb question – I get a profound impression that this email *and* the 
> guidelines are directed at Wikimedia employees, not MediaWiki extension 
> developers in general.
> 
> ...why? I thought that most extensions out there are not managed by 
> Wikimedia, right?

These guidelines, and indeed most coding conventions and development 
guidelines, are directed at MediaWiki developers in general. They are not 
merely for WMF-maintained extensions or WMF staff.

Of the 70+ bullet points in the current draft, there are exactly 2 points 
specific to WMF, and these are explicitly called out as such to avoid confusion 
(those are about sec/perf review, and stewardship).

The best practices document aims to reflect status quo among MediaWiki 
developers, which includes maintainers of the many MediaWiki hosted in Gerrit 
that aren't WMF-deployed. It is my understanding that by and large extensions 
follow the same coding styles, conventions, and guidelines. I also find in 
practice that volunteers contribute as much (if not, more than) staff to the 
shaping and implementation of these guidelines. Although these volunteers do 
tend to contribute more to WMF-deployed extensions, they are not currently 
staff.

> In the latter case – are Wikimedia employees the right group to be guiding 
> the discussion around these guidelines?
> 

No. I specifically addressed the email to all/everyone, and all MW developers 
are welcome to participate. Extension maintainers often do follow these 
conventions. But, they are by no means required to. It's not a requirement for 
hosting (unlike e.g. licensing and code of conduct), and there are certainly 
numerous extensions that follow a different coding style, or don't follow any 
(publicly known) guidelines.

There is one line in the TLDR of my email where I called upon tech leads among 
WMDE/WMF staff. I recognise this may have set a confusing tone.

-- Timo


> On 1/27/22 05:43, Krinkle wrote:
>> TLDR: Tech leads please review Best practices for extensions 
>> <https://www.mediawiki.org/wiki/Best_practices_for_extensions> on 
>> mediawiki.org.
>> 
>> Hi all,
>> 
>> You may be familiar with the Best practices for extensions 
>> <https://www.mediawiki.org/wiki/Best_practices_for_extensions> page on 
>> mediawiki.org. It has been marked as a draft since 2017.
>> 
>> I'd like to polish this page and get it to a state where it would be 
>> uncontroversial to label it as "Development guideline 
>> <https://www.mediawiki.org/wiki/Development_guidelines>". This would not 
>> make it a hard policy. Neither does it imply that it covers all practices in 
>> all situations.
>> 
>> Rather, it would mean that the items that are there now are indeed a part of 
>> our current best practices. We would keep it alive through bold 
>> <https://en.wikipedia.org/wiki/Wikipedia:Be_bold> edits and talk page 
>> conversations, similar to our Coding conventions 
>> <https://www.mediawiki.org/wiki/Manual:Coding_conventions/PHP> and other 
>> such guidelines that we maintain peer to peer and through consensus.
>> 
>> The reason I've not simply labelled it as such already is because before 
>> today I found the document to be out of sync with our actual practices. I 
>> have made a number of changes with descriptive edit summaries to bring it in 
>> sync with what I percieve to be our best practices; based on how myself and 
>> other maintainers perform code review at large, and how we review new 
>> extensions prior to deployment.
>> 
>> All are welcome to fix mistakes, raise questions/concerns on the talk page, 
>> on this thread. You're also welcome to message me directly anytime if you 
>> prefer.
>> 
>> If you consider yourself familiar with our practices and/or lead and mentor 
>> other engineers, please take a minute to review the page and consider 
>> whether the items reflect your current understanding and judgement.
>> 
>> --
>> Timo Tijhof,
>> Principal Engineer,
>> Wikimedia Performance Team.
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #40: January 2022

2022-02-04 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to 
find out!

Incidents
There were no incidents this January. Pfew! Remember to review and schedule 
Incident Follow-up work <https://phabricator.wikimedia.org/project/view/4758/> 
in Phabricator. These are preventive measures and tech debt mitigations written 
down after an incident is concluded. Read about past incidents at Incident 
status <https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech.

Trends
During 2021, I compared us to the median of 4 incidents per month, as measured 
over the two years prior (2019-2020).

I'm glad to announce our median has lowered to 3 per month over the past two 
years (2020-2021). For more plots and numbers about our incident documentation, 
refer to Incident stats <https://codepen.io/Krinkle/full/wbYMZK>.

Since the previous edition, we resolved 17 tasks from previous months. In 
January, there were 45 new error reports 
<https://phabricator.wikimedia.org/maniphest/query/f24Xwi0bGGZU/#R> of which 28 
have been resolved within the same month, the remaining 17 have carried over to 
February.

With precisely 17 tasks both closed and added, the workboard remains at the 
exact total of 298 open tasks, for the third month in a row. That's quite the 
coincidence.

Figure 1: Unresolved error reports by month. 
<https://phabricator.wikimedia.org/phame/post/view/266/production_excellence_40_january_2022/#trends>

Take a look at the workboard and look for tasks that could use your help.
→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Thanks!
Thank you to everyone who helped by reporting, investigating, or resolving 
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

* Doc Brown <https://en.wikiquote.org/wiki/Back_to_the_Future_Part_II>*: *It 
could mean that that point in time contains some cosmic significance.., as if 
it were the temporal junction point of the entire space-time continuum… Or it 
could just be an amazing coincidence.*



 Share or read later via https://phabricator.wikimedia.org/phame/post/view/266/

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Best practices for extensions

2022-01-26 Thread Krinkle
TLDR: Tech leads please review Best practices for extensions 
 on mediawiki.org.

Hi all,

You may be familiar with the Best practices for extensions 
 page on 
mediawiki.org. It has been marked as a draft since 2017.

I'd like to polish this page and get it to a state where it would be 
uncontroversial to label it as "Development guideline 
". This would not make 
it a hard policy. Neither does it imply that it covers all practices in all 
situations.

Rather, it would mean that the items that are there now are indeed a part of 
our current best practices. We would keep it alive through bold 
 edits and talk page 
conversations, similar to our Coding conventions 
 and other such 
guidelines that we maintain peer to peer and through consensus.

The reason I've not simply labelled it as such already is because before today 
I found the document to be out of sync with our actual practices. I have made a 
number of changes with descriptive edit summaries to bring it in sync with what 
I percieve to be our best practices; based on how myself and other maintainers 
perform code review at large, and how we review new extensions prior to 
deployment.

All are welcome to fix mistakes, raise questions/concerns on the talk page, on 
this thread. You're also welcome to message me directly anytime if you prefer.

If you consider yourself familiar with our practices and/or lead and mentor 
other engineers, please take a minute to review the page and consider whether 
the items reflect your current understanding and judgement.

--
Timo Tijhof,
Principal Engineer,
Wikimedia Performance Team.___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #39: December 2021

2022-01-17 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to 
find out!

Incidents

One documented incident last month (Incident graphs 
<https://codepen.io/Krinkle/full/wbYMZK>).

2021-12-03 mx 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-12-03_mx>
Impact: A portion of outgoing email from wikimedia.org was delivered with a 
delay of upto 24 hours. This affected staff Gmail, and Znuny/Phabricator 
notifications. No mail was lost, it was eventually delivered.

Incident follow-up

Remember to review and schedule Incident Follow-up work 
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator. These 
are preventive measures and tech debt mitigations written down after an 
incident. Read about past incidents at Incident status 
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech.
Recently resolved incident follow-up:

Create paging alert for high MX queues 
<https://phabricator.wikimedia.org/T297144>.
Filed in December after the mail delivery incident, resolved later that month 
by Keith (Herron).

Limit db execution time of expensive MW special pages 
<https://phabricator.wikimedia.org/T297708>.
Filed in December after various incidents due to high DB/appserver load, 
carried out by Amir (Ladsgroup).

Trends

In December we reported 22 new errors in December 
<https://phabricator.wikimedia.org/maniphest/query/DhZaBJ5PI1NA/#R>, of which 5 
have since been resolved, and 17 remain open and have carried over to January. 
From the 298 issues previously carried over, we also resolved 17, thus the 
workboard still adds up to 298 in total.

In previous editions, we sometimes looked at the breakdown of tasks that 
remained unresolved. This time, I'd like to draw attention to the throughput 
and distribution of tasks that did get resolved.

Production errors resolved in the month of December, by team and component 
(query <https://phabricator.wikimedia.org/maniphest/query/vIEXYsei8lwE/#R>):

 * Community-Tech (2): GlobalPreferences (1), CodeMirror (1).
 * DBA: DjVuHandler (1).
 * Editing-team: DiscussionTools (1).
 * Fundraising Tech: CentralNotice (1).
 * Growth-Team (8): GrowthExperiments (6), Image-Suggestions (1), 
StructuredDiscussions (1).
 * Language-Team: UniversalLanguageSelector (1).
 * Parsoid (1).
 * Product-Infrastructure: TemplateStyles (1).
 * Readers-Web (2).
 * Structured-Data (2).
 * Wikidata team: Wikidata-Page-Banner (1).
 * Missing steward (1): MediaWiki-Logevents (T289806: Thanks Umherirrender!).
For the month-over-month numbers, refer to the spreadsheet data 
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>.

Outstanding errors

Oldest unresolved errors:

 * (June 2020) WikibaseClient: RuntimeException in wblistentityusage API. 
T254334 <https://phabricator.wikimedia.org/T254334>
 * (June 2020) WikibaseClient: Deadlock in EntityUsageTable::addUsages method. 
T255706 <https://phabricator.wikimedia.org/T255706>

Take a look at the workboard and look for tasks that could use your help.
→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/


Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving 
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof


 Share or read later via https://phabricator.wikimedia.org/phame/post/view/265/

___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #38: November 2021

2021-12-11 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to 
find out!

Incidents
6 documented incidents last month. That's above the two-year and five-year 
median of 4 per month (per Incident graphs 
<https://codepen.io/Krinkle/full/wbYMZK>).

2021-11-04 large file upload timeouts 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-04_large_file_upload_timeouts>;
 Impact: For 9 months, editors were unable to upload large files (e.g. to 
Commons). Editors would receive generic error messages, typically after a 
timeout. In retrospect, a dozen different distinct production errors had been 
reported and regularly observed that were related and provided different clues, 
however most of these remained untriaged and uninvestigated for months. This 
may be related to the affected components having no active code steward.

2021-11-05 TOC language converter 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-05_TOC_language_converter>;
 Impact: For 6 hours, wikis experienced a blank or missing table of contents on 
many pages. For up to 3 days prior, wikis that have multiple language variants 
(such as Chinese Wikipedia) displayed the table of contents in an incorrect or 
inconsistent language variant (which are not understandable to some readers).

2021-11-10 cirrussearch commonsfile outage 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-10_cirrussearch_commonsfile_outage>;
 Impact: For ~2.5 hours, the Search results page was unavailable on many wikis 
(except English Wikipedia). On Wikimedia Commons the search suggestions feature 
was unresponsive as well.

2021-11-18 codfw ipv6 network 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-18_codfw_ipv6_network>;
 Impact: For 8 minutes, the Codfw cluster experienced partial loss of IPv6 
connectivity for upload.wikimedia.org. This did not affect availability of the 
service because the "Happy Eyeballs 
<https://en.wikipedia.org/wiki/Happy_Eyeballs>" algorithm ensures browsers (and 
other clients) automatically fallback to IPv4. The Codfw cluster generally 
serves Mexico and parts of the US and Canada. The upload.wikimedia.org service 
serves photos and other media/document files, such as displayed in Wikipedia 
articles.

2021-11-23 core network routing 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-23_Core_Network_Routing>;
 Impact: For about 12 minutes, Eqiad was unable to reach hosts in other data 
centers via public IP addresses. This was due to a BGP routing error. There was 
no impact on end-user traffic, and impact on internal traffic was limited (only 
Icinga alerts themselves) because internal traffic generally uses local IP 
subnets which we currently route with OSPF instead of BGP.

2021-11-25 eventgate-main outage 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-25_eventgate-main_outage>;
 Impact: For about 3 minutes, eventgate-main was down. This resulted in 25,000 
MediaWiki backend errors due to inability to queue new jobs. About 1000 
user-facing web requests failed (HTTP 500 Error). Event production briefly 
dropped from ~3000 per second to 0 per second.

Incident follow-up
Remember to review and schedule Incident Follow-up work 
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator, which 
are preventive measures and tech debt mitigations written down after an 
incident is concluded. Read more about past incidents at Incident status 
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech.

Recently resolved incident follow-up:

Disable DPL on wikis that aren't using it 
<https://phabricator.wikimedia.org/T287916>
Filed after a July 2021 incident, done by Amir (Ladsgroup) and Kunal (Legoktm).

Create easy access to MySQL ports for faster incident response and maintenance 
<https://phabricator.wikimedia.org/T291352>
Filed in Sep 2021, and carried out by Stevie (Kormat).

Create paging alert for primary DB hosts 
<https://phabricator.wikimedia.org/T233684>
Filed after a Sept 2019 incident, done by Stevie (Kormat).


Trends
November saw 27 new production error reports of which 14 were resolved, and 13 
remain open and carry over to the next month.

Of the 301 errors still open from previous months, 16 were resolved. Together 
with the 13 carried over from November that brings the workboard to 298 
unresolved tasks.

Figure 1: Unresolved error reports by month 
<https://phabricator.wikimedia.org/phame/post/view/261/production_excellence_38_november_2021/#trends>.


Outstanding errors
Take a look at the workboard and look for tasks that could use your help.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

 Did you know:
*To find your team's error reports, use the appropriate ***"Filter" link in the 
sidebar of the workboard***.*

Issues carried over from recent months:

[Wikitech-l] Production Excellence #37: October 2021

2021-11-04 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to 
find out!

Incidents
There were 4 documented incidents last month. This is currently on average, 
compared to the past five years (per Incident graphs 
<https://codepen.io/Krinkle/full/wbYMZK>).

2021-10-08 network provider 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-10-08_network_provider>;
 Impact: For upto an hour, some regions experienced a partial connectivity 
outage. This primarily affected the US East Coast for 13 minutes, and Russia 
for 1 hour. It was caused by a routing problem with one among several network 
providers.

2021-10-22 eqiad networking 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-10-22_eqiad_return_path_timeouts>;
 Impact: For 40 minutes clients that are normally geographically routed to 
Eqiad experienced connection or timeout errors. We lost about 7K req/s during 
this time. After initial recovery, Eqiad was ready and repooled in 10 minutes.

2021-10-25 s3 db replica 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-10-25_s3_db_recentchanges_replica>;
 Impact: For 30min MediaWiki backends were slower than usual. For 12 hours, 
many wiki replicas were stale for Wikimedia Cloud Services such as Toolforge.
2021-10-29 graphite 
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-10-29_graphite>;
 Impact: During a server upgrade, historical data was lost for a subset of 
Graphite metrics. Some were recovered via the redundant server, but others were 
lost as the redundant was also upgraded since then and lost some in a similar 
fashion.

Remember to review and schedule Incident Follow-up work 
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator, which 
are preventive measures and tech debt mitigations written down after an 
incident is concluded. Read about past incidents at Incident status 
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech.


Trends


**Norwegian blue** 
*298 bugs were up on the board.
We solved 20 of those over the past thirty days.*

*How many might now be left unexplored?
We also added new bugs to our database.*

*Half those bugs are pining for their fjord.
The other 23 carry on, with their dossiers.*

*All in all, 301 bugs up on the board.*



In October, 49 new tasks 
<https://phabricator.wikimedia.org/maniphest/query/3A8rqYpefUFF/#R> were 
reported as production errors. Of these, we resolved 26, and 23 remain 
unresolved and carry forward to the next month.
Previously, the production error workboard held an accumulated total of 298 
still-open error reports. We resolved 20 of those. Together with the 23 new 
errors carried over from October, this brings us to 301 unresolved errors on 
the board.

Figure 1: Unresolved error reports by month. 
<https://phabricator.wikimedia.org/phame/post/view/260/production_excellence_37_october_2021/#trends>

For the month-over-month numbers, refer to the spreadsheet data 
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>.

Outstanding errors

Take a look at the workboard and look for tasks that could use your help:
View Workboard 
<https://phabricator.wikimedia.org/tag/wikimedia-production-error/>

Issues carried over from recent months:
Apr 2021: 
9 of 42 issues left.
May 2021: 
16 of 54 issues left.
Jun 2021:
9 of 26 issues left.
Jul 2021:
12 of 31 issues left.
Aug 2021:
12 of 46 issues left.
Sep 2021:
11 of 24 issues left.
Oct 2021:
23 of 49 new issues 
<https://phabricator.wikimedia.org/maniphest/query/3A8rqYpefUFF/#R> are carried 
forward.

Thanks

Thank you to everyone who helped by reporting, investigating, or resolving 
problems in Wikimedia production. Thanks!
Until next time,

– Timo Tijhof



  Share or read later via 
https://phabricator.wikimedia.org/phame/post/view/260/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l]  Umherirrender receives Web Perf Hero award!

2021-10-22 Thread Krinkle
We are happy to announce that Umherirrender has received this quarter's Web
Perf Hero award.

Umherirrender has initiated and carried out significant improvements to the
performance of MediaWiki user preferences (T278650
, T58633
 , and T291748
). The impact is felt widely
and throughout Wikimedia sites. For example, when switching languages via
the ULS selector, or exploring Beta Features and Gadgets, or switching
skins. These are all powered by the MediaWiki "Preferences" component.

The work included implementing support for deferred message parsing in more
HTMLForm classes, and applying this to the Echo and Gadgets extensions.
This cut API latency by over 50%, from 0.7s to 0.3s at the median, and 1.2s
to 0.5s at p95. (See graphs at T278650#7130951
).

This ward is given to individuals who have gone above and beyond to improve
the performance of Wikimedia Foundation sites. It's awarded once a quarter,
and takes the form of a Phabricator badge

More information and past recipients:
https://www.mediawiki.org/wiki/Wikimedia_Performance_Team/Web_Perf_Hero_award

Phabricator badge:
https://phabricator.wikimedia.org/badges/view/17/

-- Timo Tijhof, on behalf of Wikimedia Performance Team.
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #36: September 2021

2021-10-21 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to
find out!
Incidents

We've had quite an eventful month, with 8 documented incidents in
September. That's the highest since last year (Feb 2020) and one of the
three worst months of the last five years.

   - 2021-09-01 partial Parsoid outage
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-01_partial_parsoid_outage>
  - Impact: For 9 hours, 10% of Parsoid requests to parse/save pages
  were failing on all wikis. Little to no end-user impact apart from minor
  due to RESTBase retries.
   - 2021-09-04 appserver latency
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-04_appserver_latency>
  - Impact: For 37 minutes, MW backends were slow with 2% of requests
  receiving errors. This affected all wikis through logged-in
users, bots/API
  queries, and some page views from unregistered users (e.g. pages
that were
  recently edited or expired from CDN cache).
   - 2021-09-06 Wikifeeds
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-06_Wikifeeds>
  - Impact: For 3 days, the Wikifeeds API failed ~1% of requests (e.g.
  5 of 500 req/s).
   - 2021-09-12 Esams upload
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-12-Esams-upload>
  - Impact: For 20 minutes, images were unavailable for people in
  Europe, affecting all wikis.
   - 2021-09-13 CirrusSearch restart
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-13_cirrussearch_restart>
  - Impact: For ~2 hours, search was unavailable on Wikipedia from all
  regions. Search suggestions were missing or slow, and the search results
  page errored with "Try again later".
   - 2021-09-18 appserver latency
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-18_appserver_latency>
  - Impact: For ~10 minutes, MW backends were slow or unavailable for
  all wikis.
   - 2021-09-26 appserver latency
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-26_appserver_latency>
  - Impact: For ~15 minutes, MW backends were slow or unavailable for
  all wikis.
   - 2021-09-29 eqiad kubernetes
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-09-29_eqiad-kubernetes>
  - Impact: For 2 minutes, MW backends were affected by a Kubernetes
  issue (via Kask sessionstore). 1500 edit attempts failed (8% of
POSTs), and
  logged-in pageviews were slowed down, often taking several seconds.

Remember to review and schedule Incident Follow-up work
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator,
which are preventive measures and tech debt mitigations written down after
an incident is concluded.

See also Incident graphs <https://codepen.io/Krinkle/full/wbYMZK>.
Trends

The month of September saw 24 new production error reports of which 11 have
since been resolved, and today, three to six weeks later, 13 remain open
and have thus carried over to the next month. This is about average,
although it makes it no less sad that we continue to introduce (and carry
over) more errors than we rectify in the same time frame.

On the other hand, last month we did have a healthy focus on some of the
older reports. The workboard stood at 301 unresolved errors last month. Of
those, 16 were resolved. With the 13 new errors from September, this
reduces the total slightly, to 298 open tasks.

Figure 1: Unresolved error reports by month.
<https://phabricator.wikimedia.org/phame/post/view/259/production_excellence_36_september_2021/#trends>

For the month-over-month numbers, refer to the spreadsheet data
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
.
Did you know

   -  The default *"system error" page now includes a request ID*. T291192
   <https://phabricator.wikimedia.org/T291192>


   -  To zoom in and find your team's error reports, *use the appropriate
   "Filter" link in the sidebar* of the workboard
   <https://phabricator.wikimedia.org/tag/wikimedia-production-error/>.

Outstanding errors

Take a look at the workboard and look for tasks that could use your help.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Last few months in review:
Jan 2021 (50 issues
<https://phabricator.wikimedia.org/maniphest/query/gn7TOpf2LdVE/#R>) 3
left. *Unchanged.*
Feb 2021 (20 issues
<https://phabricator.wikimedia.org/maniphest/query/xQxnXZys4q97/#R>) 5 > 4
left.
Mar 2021 (48 issues
<https://phabricator.wikimedia.org/maniphest/query/To0edISjsA9s/#R>) 10 > 9
left.
Apr 2021 (42 issues
<https://phabricator.wikimedia.org/maniphest/query/ORxSVxnJBlLc/#R>) 17 >
10 left.
May 2021 (54 issues
<https://phabricator.wikimedia.or

[Wikitech-l]  Fresh 21.09 released!

2021-09-29 Thread Krinkle
This release includes Node.js 12 and npm 7.

Get started by installing, updating, or learning more, at:
https://gerrit.wikimedia.org/g/fresh#fresh-environment

Changelog:
https://gerrit.wikimedia.org/g/fresh/+/21.09.1/CHANGELOG.md

This release upgrades the default "fresh-node" command to the latest WMF CI
image for Node.js, which bundles Node.js 12 and npm 7. It is based on
Debian 11 ("Bullseye") and includes Firefox 78 and Chromium 90 as well.

The previous Node.js 10 environment remains available as "fresh-node10" in
case you need it during the transition over the coming weeks. It will
likely be removed within a month or two. If you encounter problems with
Node.js 12, let us know on Phabricator! [1]

This release also introduces a "fresh-node14" command for experimention
with Node.js 14.

Fresh is a fast way to create isolated environments from your terminal.
These can be used to more securely work with 'npm' developer tools such as
ESLint, QUnit, Grunt, Selenium, and more. Example guide:
https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing

--
Timo Tijhof

[1] https://phabricator.wikimedia.org/tag/fresh/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Is the "clearyourcache" message still useful?

2021-09-25 Thread Krinkle
TLDR: If you use "hard refresh" in your browser after editing a CSS or JS
page on the wiki, I'd like to hear about it so as to figure out whether
there may be a bug in the software. Please do test and confirm it for
yourself once more without a hard refresh, as it might just be an old habit
acting as a placebo :)

-

The "clearyourcache" message is the message that, contrary to its interface
key, instructs users to perform a page refresh in order to see the effect
of their edit to JS and CSS-related pages. For example:

https://translatewiki.net/wiki/MediaWiki:Clearyourcache/en
https://en.wikipedia.org/wiki/MediaWiki:Common.css
https://meta.wikimedia.org/wiki/MediaWiki:OSM.js
https://meta.wikimedia.org/wiki/User:Krinkle/global.js
https://en.wikipedia.org/wiki/MediaWiki:Citoid-template-type-map.json

As I understand it, this message is no longer be needed. I'm not aware of
any scenario in which a page carries this message, and edits to the page in
question would propagate to the editor's browser sooner as result of
performing a "hard reload".

If you find yourself doing hard reloads, or know people that do, I'd love
to hear detailed examples of specific combinations of pages/wikis/browsers
where people do this.

I've detailed the technical reasons for why these (should) have no impact,
on Phabricator:
https://phabricator.wikimedia.org/T291744

-- Timo
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Universal Edit Button: Remove redundant rel="edit" link head

2021-09-20 Thread Krinkle
As usual with experimental standards that ship in the wild with "temporary"
semantics, the temporary semantics become defacto standard and the "future"
standard remains reserved for theoretical future use.

In T21165  I compare half
a dozen implementations and specs and observe that they all support
rel="alternate" whereas only one supports (in addition to rel="alternate")
also rel="edit".

Thus I'm concluding that rel="edit" is a dead standard and that at least
for right now there is no benefit to MediaWiki outputting it. There is a
cost for us to output it, but there is not really any signficant cost for
clients to support both, and supporting both is mandatory either way for
the theoretical future standard to be adopted.

If in another ten years notable clients actually supported the supposed
newer standard by then, we could switch at that time.

Task: https://phabricator.wikimedia.org/T21165
Commit: https://gerrit.wikimedia.org/r/722485

A potential argument could also be made that it should be removed entirely.
I myself have never understood why one would want a browser extension to
display an Edit button outside the viewport. It seems unappealing from a UX
perspective and for me personally would likely fade into
"banner blindness" and notice if it were detected and/or notice it too much
if it tries to get my attention on any editable page. In any event, while I
would love to hear from people who find this useful, I am /not/ proposing
its removal.

-- Timo
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Minor changes to MW logging as shown in Logstash

2021-09-17 Thread Krinkle
TLDR:

The way key-values passed to $logger->debug() and $logger->warning() are
shown in Logstash will soon change. The proposed transition will be
forward- and backward- compatible for at least 90 days, so you can always
find your logs in one place, and will not need to update any queries (yet).

Today, metadata from Syslog, Monolog processors, and MediaWiki context
arrays are mixed into one flat array. After the transition, rows in
Logstash will preserve the context as its own array under a "context" key.
Visual example at https://phabricator.wikimedia.org/T247675#7360872.

Background

Our wiring code for Logstash has diverged from upstream. The V0 format,
which we have effectively forked, has various usability issues.

For these two reasons, the few of us looking after the debug/monolog code
(incl Reedy, Daimona, and myself) would like to transition to use
upstream's V2 format directly without some of the overrides we currently
have.

If you use Logstash regularly and are worried this might affect your
workflow, check out the transition plan [1] and comment on Phabricator (or
email me) with any suggestions or concerns that I might have missed.

[1] https://phabricator.wikimedia.org/T247675#7360872

-- Timo
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #35: August 2021

2021-09-07 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to
find out!
Incidents

Zero documented incidents last month. Isn't that something! (Incident graphs
<https://codepen.io/Krinkle/full/wbYMZK>)

Learn about past incidents at Incident status
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech. Remember
to review and schedule Incident Follow-up
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator,
which are preventive measures and other action items to learn from.
Trends

In August we resolved 18 of the 156 reports that carried over from previous
months, and reported 46 new failures in production. Of the new ones, 17
remain unresolved as of writing and will carry over to next month.

The number of new errors reports in August was fairly high at 46, compared
to 31 reports in July, and 26 reports in June.

The backlog of "Old" issues saw no progress this past month and remained
constant at 146 open error reports.

Figure 1, Figure 2: Unresolved error reports, stacked by month.
<https://phabricator.wikimedia.org/phame/post/view/248/production_excellence_35_august_2021/#trends>

 *Did you know*: You can zoom in to your team's error reports by using
the appropriate "Filter" link in the sidebar of our shared workboard?

Take a look at the workboard and look for tasks that could use your help.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/
Progress

Last few months in review:
Jan 2021 (50 issues
<https://phabricator.wikimedia.org/maniphest/query/gn7TOpf2LdVE/#R>) 3 left.
Feb 2021 (20 issues
<https://phabricator.wikimedia.org/maniphest/query/xQxnXZys4q97/#R>) 6 > 5
left.
Mar 2021 (48 issues
<https://phabricator.wikimedia.org/maniphest/query/To0edISjsA9s/#R>) 13 >
10 left.
Apr 2021 (42 issues
<https://phabricator.wikimedia.org/maniphest/query/ORxSVxnJBlLc/#R>) 18 >
17 left.
May 2021 (54 issues
<https://phabricator.wikimedia.org/maniphest/query/9y.PWGoGgWbK/#R>) 22 >
20 left.
Jun 2021 (26 issues
<https://phabricator.wikimedia.org/maniphest/query/DlpqBkLj0aP4/#R>) 11 >
10 left.
Jul 2021 (31 issues
<https://phabricator.wikimedia.org/maniphest/query/qQAV178rYaJ_/#R>) 16 >
12 left.
Aug 2021 (46 issues
<https://phabricator.wikimedia.org/maniphest/query/_VlOsgZ9On4g/#R>) +17
new unresolved issues.

Tally:
156 issues open, as of Excellence #34
<https://phabricator.wikimedia.org/phame/post/view/247/production_excellence_34_july_2021/>
(July 2021).
-18 issues closed, of the previously open issues.
+17 new issues that survived August 2021.
155 issues open, as of today (3 Sep 2021).

For more month-over-month numbers refer to the spreadsheet
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
.
Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof


 Share or read later via
https://phabricator.wikimedia.org/phame/post/view/248/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Action API: Removal of deprecated CSRF token parameters

2021-08-26 Thread Krinkle
New page:
https://www.mediawiki.org/wiki/MediaWiki_1.37/Deprecation_of_legacy_API_token_parameters

On Wed, 2 Jun 2021 at 17:03, Petr Pchelko  wrote:

> […]
> the following API endpoints were used to obtain a token:
>
>
>- ‘rctoken’ in action=query=recentchanges [2]
>- ‘rvtoken’ in action=query=revisions [3]
>- ‘intoken’ in action=query=info [4]
>- ‘ustoken’ in action=query=users[5]
>
>
> […] clients now need to use a consolidated ‘action=query=tokens’
> endpoint.
>

It took me a little while to figure out the correct replacement in some
cases. Especially uiprop=preferencestoken and "type=edit" since
"preferences" and "edit" are not accepted by the new API as valid types.

It is not mentioned anywhere on the relevant pages, but I understand these
essentially fall in the bucket of general csrf tokens now. I remember that
from years ago, but did not connect the dots with the API module change at
the same time.

I've captured various bits of information and links together on this wiki
page:
https://www.mediawiki.org/wiki/MediaWiki_1.37/Deprecation_of_legacy_API_token_parameters

-- Timo
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #34: July 2021

2021-08-18 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to
find out!
Incidents

3 documented incidents last month. That's at the median for the past twelve
months, and slightly below the median of 4 over the past five years (Incident
stats graphs <https://codepen.io/Krinkle/full/wbYMZK>).

   - 2021-07-14 eventgate latency spike
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-14_eventgate-analytics_latency_spike_caused_MW_app_server_overload>
  - Impact: For ~ 10min MediaWiki API clients experienced request
  failures.
   - 2021-07-16 codfw-a2 network
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-16_asw-a2-codfw_network>
  - Impact: For ~ 1 hour Restbase clients received errors, affecting
  mobile apps and ContentTranslation.
   - 2021-07-26 ruwikinews DynamicPageList
   
<https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-07-26_ruwikinews_DynamicPageList>
  - Impact: For 30min, 15% of requests from contributors on all wikis
  failed. There were also brief moments during which no readers could load
  recently modified or uncached pages.

Learn about past incidents at Incident status
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech. Remember
to review and schedule Incident Follow-up
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator,
which are preventive measures and other action items filed after an
incident.
Trends

Last month the workboard held 154 non-old unresolved error reports. Over
the past thirty days, the collective efforts of our volunteers and
engineering teams have closed 14 of those.

In the month of July we've also introduced or discovered thirty-one new
error reports (that's an average of one production regression every day!).
Of those new error reports, fifteen were resolved and 16 remain unresolved.
The workboard now tallies up to 156 tasks.

Over on the backlog, we're continuing to ploddingly present progress on
production problems from phantoms of christmases past.

Figure 1, Figure 2: Unresolved error reports stacked by month.
<https://phabricator.wikimedia.org/phame/post/view/247/production_excellence_34_july_2021/#trends>

For more month-over-month numbers refer to the spreadsheet data
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
.
Outstanding errors

Take a look at the workboard
<https://phabricator.wikimedia.org/tag/wikimedia-production-error/> and
look for tasks that could use your help.

Below are various older issues that may have fallen by the wayside, taken
from somewhat-random stab-in-the-dark queries.

Oldest unresolved errors that are still reproducible (Phab query
<https://phabricator.wikimedia.org/maniphest/query/07CAHhY.GApw/#R>):

   - Reported in 2015: Unable to view history of protected Flow board
   (StructuredDiscussions, Growth team), T118502
   <https://phabricator.wikimedia.org/T118502>.
   - Reported in 2016: Error when deleting a heading next to a table
   (VisualEditor, Editing team), T140871
   <https://phabricator.wikimedia.org/T140871>.

Stalled error reports (Phab query
<https://phabricator.wikimedia.org/maniphest/query/Dmy0AuERAQct/#R>):

   - Stalled Mar 2021: Constraints check for Q142 France times out
   (Wikidata, WMDE), T212282 <https://phabricator.wikimedia.org/T212282>.

Oldest error with a patch for review (Phab query
<https://phabricator.wikimedia.org/maniphest/query/eb6hYVaKr0Kx/#R>):

   - Reported in 2016: Maps broken during 2nd live preview (Maps, Product
   Infra), T151524 <https://phabricator.wikimedia.org/T151524>.
   - Reported in 2018: Corrupt connection for cross-wiki db query (Platform
   team), T193565 <https://phabricator.wikimedia.org/T193565>.

Jan 2021 (3 of 50 issues
<https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R> left) ⚠️
*Unchanged. Have a look-see!*
Feb 2021 (6 of 20 issues
<https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R> left) ⚠️
*Unchanged. Take a gander!*
Mar 2021 (13 of 48 issues
<https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R> left) ⚠️
*Unchanged. Check it out!*
Apr 2021 (18 of 42 issues
<https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R> left) -1
May 2021 (22 of 54 issues
<https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R> left) -3
June 2021 (11 of 26 issues
<https://phabricator.wikimedia.org/maniphest/query/roL0TaxtcaLQ/#R> left) -4
July 2021 (16 of 31 issues
<https://phabricator.wikimedia.org/maniphest/query/mUVAD8TJHE3n/#R> left) +31;
-15
Tally
154 issues open, as of Excellence #33 (June 2021)
<https://phabricator.wikimedia.org/phame/post/view/240/production_excellence_33_june_2021/>
.
-14 issues closed, of the previous 154 open issues.
+16 new iss

[Wikitech-l] Re: Goto for microoptimisation

2021-08-13 Thread Krinkle
For the record, I merged Tim's patch
 last week and
was unaware of this email thread.

My thinking was as follows:

1. The implementation does not depend on the goto statement.

That is, it is not used to write overly-clever or complicated logic. If you
remove the goto statement, the method behaves the exact same way. And thus
the moment it ceases to serve its use (performance optimisation) it can be
safely removed without further thought. think this is essential to keeping
the code easy to understand and maintain. This one principle actually
covers it all for me. The next three points are implied by this:
1a. This use of goto only jumps downward. Jumping backwards (up) would
likely violate point 1, and either way would imho be too complicated to
think about when debugging or maintaining the code in the future.
Especially the potential for an infinite loop.
1b. This use of goto only jumps to a statement within the same function.
(In fact, jumping to another file, class, or function is not supported by
PHP in the first place. This is literally the only way it can be used.
There is some sanity in the language after all!).
1c. This use of goto serves as a performance optimisation for a hot code
path. Similarly implied by point 1: If it doesn't change behaviour and
doesn't improve performance where it matters, we shouldn't bother using it.

2. An inline comment clearly stays it is a performance optimisation, and
explains why it is safe, and how we know the destination is where we would
end up regardless. (e.g. "we're inside a condition for X2, so we can skip
to the else of X1, and no other statements would run between here and
there").

-- Timo


On Sat, 31 Jul 2021 at 05:10, Tim Starling  wrote:

> For performance sensitive tight loops, such as parsing and HTML
> construction, to get the best performance it's necessary to think about
> what PHP is doing on an opcode by opcode basis.
>
> Certain flow control patterns cannot be implemented efficiently in PHP
> without using "goto". The current example in Gerrit 708880
> 
> comes down to:
>
> if ( $x == 1 ) {
>   action1();
> } else {
>   action_not_1();
> }
> if ( $x == 2 ) {
>   action2();
> } else {
>   action_not_2();
> }
>
> If $x==1 is true, we know that the $x==2 comparison is unnecessary and is
> a waste of a couple of VM operations.
>
> It's not feasible to just duplicate the actions, they are not as simple as
> portrayed here and splitting them out to a separate function would incur a
> function call overhead exceeding the proposed benefit.
>
> I am proposing
>
> if ( $x == 1 ) {
>   action1();
>   goto not_2; // avoid unnecessary comparison $x == 2
> } else {
>   action_not_1();
> }
> if ( $x == 2 ) {
>   action2();
> } else {
>   not_2:
>   action_not_2();
> }
>
> I'm familiar with the cultivated distaste for goto. Some people are just
> parotting the textbook or their preferred authority, and others are scarred
> by experience with other languages such as old BASIC dialects. But I don't
> think either rationale really holds up to scrutiny.
>
> I think goto is often easier to read than workarounds for the lack of
> goto. For example, maybe you could do the current example with break:
>
> do {
>   do {
>   if ( $x === 1 ) {
>   action1();
>   break;
>   } else {
>   action_not_1();
>   }
>   if ( $x === 2 ) {
>   action2();
>   break 2;
>   }
>   } while ( false );
>   action_not_2();
> } while ( false );
>
> But I don't think that's an improvement for readability.
>
> You can certainly use goto in a way that makes things unreadable, but that
> goes for a lot of things.
>
> I am requesting that goto be considered acceptable for micro-optimisation.
>
> When performance is not a concern, abstractions can be introduced which
> restructure the code so that it flows in a more conventional way. I
> understand that you might do a double-take when you see "goto" in a
> function. Unfamiliarity slows down comprehension. That's why I'm suggesting
> that it only be used when there is a performance justification.
>
> -- Tim Starling
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #33: June 2021

2021-07-13 Thread Krinkle
How’d we do in our strive for operational excellence last month?
Incidents

3 documented incidents. That's lower than June in the previous five years
where the month saw 5-9 incidents. I've added a new panel ⭐️ to the Incident
statistics <https://codepen.io/Krinkle/full/wbYMZK> tool. This one plots
monthly statistics on top of previous years, to more easily compare them.

Learn more from the Incident documents
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech, and
remember to review and schedule Incident Follow-up
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator,
which are preventive measures and other action items filed after an
incident.

---
Trends

In June, work on production errors appears to have stagnated a bit. Or more
precisely, the work only resulted in relatively few tasks being resolved.
15 of the 26 new tasks are still open as of writing.

Of the tasks from previous months, only 11 were resolved, leaving most
columns unchanged. See the table further down for a more detailed breakdown
and links to Phabricator queries for the tasks in question.

With the 15 remaining new tasks, and the 11 tasks resolved from our
backlog, this raises the chart from 150 to 154 tasks.

Figure 1, Figure 2: Unresolved error reports stacked by month.
<https://phabricator.wikimedia.org/phame/post/view/240/production_excellence_33_june_2021/#trends>

Month-over-month plots based on spreadsheet data
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
.

---
Outstanding errors

Take a look at the workboard and look for tasks that could use your help.
→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/


Summary over recent months:
Jan 2020 (1 of 7 left) ⚠️ Unchanged (over one year old).
Mar 2020 (2 of 2 left) ⚠️ Unchanged (over one year old).
Apr 2020 (4 of 14 left) ⚠️ Unchanged (over one year old).
May 2020 (5 of 14 left) ⚠️ Unchanged (over one year old).
Jun 2020 (5 of 14 left) ⚠️ Unchanged (over one year old).
Jul 2020 (4 of 24 issues) ⚠️ Unchanged (over one year old).
Aug 2020 (11 of 53 issues) ⬇️ One task resolved. -1
Sep 2020 (7 of 33 issues) ⚠️ Unchanged (over one year old).
Oct 2020 (19 of 69 issues) ⚠️ Unchanged (over one year old).
Nov 2020 (8 of 38 issues) ⚠️ Unchanged (over one year old).
Dec 2020 (7 of 33 issues) ⚠️ Unchanged (over one year old).
Jan 2021 (3 of 50 issues
<https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R>) ⚠️
Unchanged (over one year old).
Feb 2021 (6 of 20 issues
<https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R>) ⬇️ One
task resolved. -1
Mar 2021 (13 of 48 issues
<https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R>) ⬇️ One
task resolved. -1
Apr 2021 (19 of 42 issues
<https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R>) ⬇️
Four tasks resolved. -4
May 2021 (25 of 54 issues
<https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R>) ⬇️
Four tasks resolved. -4
June 2021 (15 of 26 issues
<https://phabricator.wikimedia.org/maniphest/query/roL0TaxtcaLQ/#R>)  26
new issues, of which 11 were closed. +26, -11

---
Tally
150 issues open, as of Excellence #32 (May 2021)
<https://phabricator.wikimedia.org/phame/post/view/236/production_excellence_32_may_2021/>
.
-11 issues closed, of the previous 150 open issues.
+15 new issues that survived June 2021.
154 issues open as of yesterday.

---
Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof


 Share or read later via
https://phabricator.wikimedia.org/phame/post/view/240/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Production Excellence #32: May 2021

2021-06-20 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to
find out!

Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/236/
Incidents

Zero incidents recorded in the past month. Yay! That's only five months
after November 2020, the last month without documented incidents (Incident
stats <https://codepen.io/Krinkle/full/wbYMZK>).

Remember to review Preventive measures
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator,
which are action items filed after an incident.

---
Trends

In May, we unfortunately saw a repeat of the worrying pattern we saw in
April
<https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/#trends>,
but with higher numbers. We found 54 new errors. This is the most new
errors in a single month, since the Excellence monthly began three years
ago in 2018. About half of these (29 of 54) remain unresolved as of
writing, two weeks into the following month.

Figure 1, Figure 2: Unresolved error reports stacked by month.
<https://phabricator.wikimedia.org/phame/post/view/236/production_excellence_32_may_2021/#trends>

Month-over-month plots based on spreadsheet data
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
.

---
New errors in May

Below is a snapshot of just the 54 new issues
<https://phabricator.wikimedia.org/maniphest/query/tmkGqt0C93YG/#R> found
last month, listed by their code steward
<https://www.mediawiki.org/wiki/Developers/Maintainers>.

Be mindful that the reporting of errors is not itself a negative point
per-se. I think it should be celebrated when teams have good telemetry,
detect their issues early, and address them within their development cycle.
It might be more worrisome when teams lack telemetry or time to find such
issues, or can't keep up with the pace at which issues are found.
Anti Harassment Tools None.
Community Tech None.
Editing Team +2, -1 Cite (T283755
<https://phabricator.wikimedia.org/T283755>); OOUI (T282176
<https://phabricator.wikimedia.org/T282176>).
Growth Team +17, -4 Add-Link (T281960
<https://phabricator.wikimedia.org/T281960>); GrowthExperiments (T281525
<https://phabricator.wikimedia.org/T281525> T281703
<https://phabricator.wikimedia.org/T281703> T283546
<https://phabricator.wikimedia.org/T283546> T283638
<https://phabricator.wikimedia.org/T283638> T283924
<https://phabricator.wikimedia.org/T283924>); Echo (T282446
<https://phabricator.wikimedia.org/T282446>); Recent-changes (T282047
<https://phabricator.wikimedia.org/T282047> T282726
<https://phabricator.wikimedia.org/T282726>); StructuredDiscussions (T281521
<https://phabricator.wikimedia.org/T281521> T281523
<https://phabricator.wikimedia.org/T281523> T281782
<https://phabricator.wikimedia.org/T281782> T281784
<https://phabricator.wikimedia.org/T281784> T282069
<https://phabricator.wikimedia.org/T282069> T282146
<https://phabricator.wikimedia.org/T282146> T282599
<https://phabricator.wikimedia.org/T282599> T282605
<https://phabricator.wikimedia.org/T282605>).
Language Team +1 Translate extension (T283828
<https://phabricator.wikimedia.org/T283828>).
Parsing Team +1 Parsoid (T281932 <https://phabricator.wikimedia.org/T281932>
).
Reading Web None.
Structured Data None.
Product Infra Team +1 WikimediaEvents (T282580
<https://phabricator.wikimedia.org/T282580>).
Analytics None.
Performance Team None.
Platform Engineering +16, -11 MediaWiki-API (T282122
<https://phabricator.wikimedia.org/T282122>); MediaWiki-General (T282173
<https://phabricator.wikimedia.org/T282173>); MediaWiki-Page-derived-data (
T281714 <https://phabricator.wikimedia.org/T281714> T281802
<https://phabricator.wikimedia.org/T281802> T282180
<https://phabricator.wikimedia.org/T282180> T283282
<https://phabricator.wikimedia.org/T283282>), MediaWiki-Revision-backend (
T282145 <https://phabricator.wikimedia.org/T282145> T282723
<https://phabricator.wikimedia.org/T282723> T282825
<https://phabricator.wikimedia.org/T282825> T283170
<https://phabricator.wikimedia.org/T283170>); MediaWiki-User-management (
T283167 <https://phabricator.wikimedia.org/T283167>); MW Expedition (T281526
<https://phabricator.wikimedia.org/T281526> T281981
<https://phabricator.wikimedia.org/T281981> T282038
<https://phabricator.wikimedia.org/T282038> T282181
<https://phabricator.wikimedia.org/T282181> T283196
<https://phabricator.wikimedia.org/T283196>).
Search Platform +3, -2 CirrusSearch (T282036
<https://phabricator.wikimedia.org/T282036> T282207
<https://phabricator.wikimedia.org/T282207>); GeoData (T282735
<https://phabricator.wikimedia.org/T282735>).
WMDE TechWish +2, -1 Revision-S

[Wikitech-l] Production Excellence #31: April 2021

2021-05-12 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to
find out!

Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/235/

Incidents

6 documented incidents. That's above the historical average
<https://codepen.io/Krinkle/full/wbYMZK> of 3–4 per month.

Learn about recent incidents at Incident status
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech, or
Preventive
measures <https://phabricator.wikimedia.org/project/view/4758/> in
Phabricator.

---

Trends

In April, we saw a continuation of the healthy trend that started this
January
<https://phabricator.wikimedia.org/phame/post/view/227/production_excellence_28_january_2021/#trends>
— a trend where the back of the line is moving forward at least as quickly
as the front of the line. We did take a little breather in March
<https://phabricator.wikimedia.org/phame/post/view/229/production_excellence_30_march_2021/#trends>
where we almost broke even, but otherwise the trend is going well.

Last month we bade farewell to the production errors we found in July 2019.
This month we cleared out the column for October 2019.

One point of concern is that we did encounter a high number of new
production errors — are errors that we failed to catch during development,
code review, continuous integration, beta testing, or pre-deployment
checks. Where we used to discover about a dozen of those a month, whereas
we found 42 during this month. As of writing, 17 of the 42 April-discovered
errors have been resolved.

The "Old" column (generally tracking pre-2019 tasks) grew for the first
time in six months. This increase can largely be attributed to improved
telemetry of client-side errors uncovering issues in under-resourced
products, such as the old Kaltura video player.

Figure 1, Figure 2: Unresolved error reports stacked by month.
<https://phabricator.wikimedia.org/phame/post/view/235/production_excellence_31_april_2021/#trends>

---

Outstanding errors

Take a look at the workboard
<https://phabricator.wikimedia.org/tag/wikimedia-production-error/> and
look for tasks that could use your help.

Summary over recent months, per spreadsheet
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
:
Aug 2019 (1 of 14 left) ⚠️ Unchanged (over one year old).
Oct 2019 (0 of 12 left) ✅ Last three tasks resolved! -3
Jan 2020 (1 of 7 left) ⚠️ Unchanged (over one year old).
Mar 2020 (2 of 2 left) ⚠️ Unchanged (over one year old).
Apr 2020 (5 of 14 left) ⚠️ Unchanged (over one year old).
May 2020 (5 of 14 left) ⏸ —
Jun 2020 (5 of 14 left) ⬇️ One task resolved. -1
Jul 2020 (4 of 24 issues
<https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R>) ⬇️ One
task resolved. -1
Aug 2020 (13 of 53 issues
<https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R>) ⬇️ Two
tasks resolved. -2
Sep 2020 (7 of 33 issues
<https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R>) ⏸ —
Oct 2020 (20 of 69 issues
<https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R>) ⬇️ Two
tasks resolved. -2
Nov 2020 (9 of 38 issues
<https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R>) ⏸ —
Dec 2020 (7 of 33 issues
<https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R>) ⬇️
Four tasks resolved. -4
Jan 2021 (1 of 50 issues
<https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R>) ⬇️ One
task resolved. -1
Feb 2021 (8 of 20 issues
<https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R>) ⬇️ One
task resolved. -1
Mar 2021 (18 of 48 issues
<https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R>) ⬇️
Sixteen tasks resolved. -16
*Apr 2021* (25 of 42 issues
<https://phabricator.wikimedia.org/maniphest/query/rYyMt_gYYymb/#R>) 42 new
issues found, of which 25 remained open. +42; -17
---
Tally
139 issues open, as of Excellence #30
<https://phabricator.wikimedia.org/phame/post/view/229/production_excellence_30_march_2021/>
(March 2021).
-31 issues closed, of the previously open issues.
+25 new issues that survived April 2021.
133 issues open, as of today (12 May 2021).
---

Thanks

Thank you to everyone who helped by reporting, investigating, or resolving
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

[Wikitech-l]  Fresh 21.04 released!

2021-04-29 Thread Krinkle
This release updates Chromium to 73, and fixes a bug where containers could
sometimes keep running if you force-closed a terminal tab that still had an
open fresh-node prompt. If you found that certain Selenium tests didn't
pass locally after March 10th, this now fixed as well. [1]

Thanks to Kunal Mehta, Željko Filipin, and James Forrester for their help
with testing and finding bugs.

Install, update, or learn more; at:
https://gerrit.wikimedia.org/g/fresh#fresh-environment

Changelog:
https://gerrit.wikimedia.org/g/fresh/+/21.04.1/CHANGELOG.md

Fresh is a fast way to launch isolated dev environments from your terminal.
These can be used to more securely work with npm-based tooling such as for
ESLint, QUnit, Grunt, WebdriverIO, and Selenium. Example guides:
* https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing
*
https://www.mediawiki.org/wiki/Selenium/Getting_Started/Run_tests_using_Fresh
*
https://www.mediawiki.org/wiki/Selenium/How-to/Run_tests_targeting_MediaWiki-Docker-Dev

-- Timo

[1] Fresh directly uses WMF CI's Docker image for Node.js and browsers.
This means you should be able to reproduce any npm or Selenium test locally
and have it behave nearly-identical to CI. In March, WMF CI jobs were
slowly updated to use its newer image with Chromium 73. Some Selenium tests
started failing in CI after that, due to Chromium 73 changing how text
nodes normalize line breaks (example: change 666946
). After fixing
those, you may found  that those
same tests would now fail in Fresh. This was because I hadn't yet released
this update. They should now be in sync again!
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Herald rules

2021-04-29 Thread Krinkle
Have we considered asking upstream Phabricator to e.g.  a non-blocking mode
for a Herald rule? These would e.g. be processed post-send in PHP or in
some other queued manner internally. It would likely need to be restricted
and (to avoid notification complexity) might also be automatically tied to
the action being "silent" like we do for these bots.

This might not replace the bot, but it would be a more widely useful and
beneficial investment perhaps.

-- Timo


On Thu, Apr 22, 2021 at 3:23 PM Amir Sarabadani  wrote:

> Hi,
> I'm really sorry for sending email to such a large venue but I couldn't
> find a better mailing list. Feel free to ignore this email if you don't do
> anything with Herald rules.
>
> Herald rules, a set of rules in phabricator to automate the work, are
> expensive and slowly making saving any change on phabricator slower and
> slower. You can take a look at this ticket.
> 
>
> As the result, we have been migrating these rules to maintenance bot
> . Which means
> changes won't be immediately applied anymore (and it'll take up to an hour
> and they will be post-change).
>
> If you see your Herald rule has been disabled, please don't enable it. If
> you want to change it, you can make a PR in the maintenance bot source
> code.  Please
> avoid introducing new Herald rules if there's a similar functionality
> supported by the bot. Just add it to the work list of the bot. An exception
> would be on time-sensitive tickets. Like handling UBN ones. They will stay
> as Herald rules.
>
> Any sort of change to improve documentation, code health, adding new
> functionality so we can migrate more Herald rules, or migrating existing
> ones would be greatly appreciated. If there are bugs, feel free to create a
> ticket for it
> .
>
> You can also create email filters to ignore emails triggered by
> maintenance bot (which its activity will increase).
>
> Best
> --
> Amir (he/him)
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Dynamically declaring parser functions?

2021-04-29 Thread Krinkle
It might be (more) feasible to use Lua (and Extension:Scribunto) for this,
or even just templates more generally, as this allows one to define on-wiki
mechanisms dynamically in a more flexible way.

(I've not fully read the thread, apologies if this is incompatible with a
requirement, I thought I'd mention it as it seemed likely to be of use.)

On Sun, Apr 18, 2021 at 2:40 PM FreedomFighterSparrow <
freedomfighterspar...@gmail.com> wrote:

> My use case is this:
>
> I have three alternative access points to the wiki, where some things
> are not allowed - e.g. videos. Each access point has different
> extensions disabled, and then tags and parser functions show "as is" on
> screen, which looks bad.
>
> My goal is to hide those, obviously.
>
> I'm upgrading MW from 1.29 to 1.35; My previous solution is here:
> https://github.com/kolzchut/mediawiki-extensions-NoopTags
>
> Basically I hooked ParserFirstCallInitHook and LanguageGetMagic to
> dynamically declare empty stubs for those missing function parsers and
> tags, using global variables $wgNoopTagsFunctionBlacklist and
> $wgNoopTagsBlacklist.
>
>
> This doesn't work in MW 1.35, because LanguageGetMagic was removed. I
> tried bypassing the issue by hooking GetMagicVariableIDsHook, but
> apparently that's only for "variables" ({{variable}}), and not parser
> functions.
>
> Is there a way to achieve my goal? Either by fixing my extension or
> doing something completely different which I haven't thought about?
>
> Thanks in advance
> Dror
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Production Excellence #30: March 2021

2021-04-14 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to
find out!

Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/229/
Incidents

2 documented incidents. That's average for this time of year, when we usually
had <https://codepen.io/Krinkle/full/wbYMZK> 1-4 incidents.

Learn about recent incidents at Incident status
<https://wikitech.wikimedia.org/wiki/Incident_status> on Wikitech, or
Preventive
measures <https://phabricator.wikimedia.org/project/view/4758/> in
Phabricator.

---

Trends

In March we made significant progress on the outstanding errors of previous
months. Several of the 2020 months are finally starting to empty out. But
with over 30 new tasks from March itself remaining, we did not break even,
and ended up slightly higher than last month. This could be reversing two
positive trends, but I hope not.

Firstly, there was a steep increase in the number of new production errors
that were not resolved within the same month. This is counter the positive
trend we started in November. The past four months typically saw 10-20
errors outlive their month of discovery, and this past month saw 34 of its
48 new errors remain unresolved.

Secondly, we saw the overall number of unresolved errors increase again. This
January
<https://phabricator.wikimedia.org/phame/post/view/227/production_excellence_28_january_2021/>
began a downward trend for the first time in thirteen months, which
continued nicely through February. But, this past month we broke even and
even pushed upward by one task. I hope this is just a breather and we can
continue our way downward.

Figure 1, Figure 2: Unresolved error reports stacked by month.
<https://phabricator.wikimedia.org/phame/post/view/229/production_excellence_30_march_2021/#trends>

---

Outstanding errors

Take a look at the workboard
<https://phabricator.wikimedia.org/tag/wikimedia-production-error/> and
look for tasks that could use your help.

Summary over recent months, per spreadsheet
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
:
Jul 2019 (0 of 18 left) ✅ Last two tasks resolved! -2
Aug 2019 (1 of 14 left) ⚠️ *Unchanged (and over one year old).*
Oct 2019 (3 of 12 left) ⬇️ One task resolved. -1
Nov 2019 (0 of 5 left) ✅ Last task resolved! -1
Dec 2019 (0 of 9 left) ✅ Last task resolved! -1
Jan 2020 (2 of 7 left) ⬇️ One task resolved. -1
Feb 2020 (0 of 7 left) ✅ Last task resolved! -1
Mar 2020 (2 of 2 left) ⚠️ *Unchanged (and over one year old).*
Apr 2020 (5 of 14 left) ⬇️ Four tasks resolved. -4
May 2020 (5 of 14 left) ⬇️ One task resolved. -1
Jun 2020 (6 of 14 left) ⬇️ One task resolved. -1
Jul 2020 (5 of 24 issues
<https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R>) ⬇️
Four tasks resolved. -4
Aug 2020 (15 of 53 issues
<https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R>) ⬇️
Five tasks resolved. -5
Sep 2020 (7 of 33 issues
<https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R>) ⬇️ One
task resolved. -1
Oct 2020 (22 of 69 issues
<https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R>) ⬇️
Four tasks resolved. -4
Nov 2020 (9 of 38 issues
<https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R>) ⬇️ Two
tasks resolved. -2
Dec 2020 (11 of 33 issues
<https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R>) ⬇️ One
task resolved. -1
Jan 2021 (4 of 50 issues
<https://phabricator.wikimedia.org/maniphest/query/WIP9W8q54HB6/#R>) ⬇️ One
task resolved. -1
Feb 2021 (9 of 20 issues
<https://phabricator.wikimedia.org/maniphest/query/5MzPJAb5oJgv/#R>) ⬇️ Two
tasks resolved. -2
*Mar 2021* (34 of 48 issues
<https://phabricator.wikimedia.org/maniphest/query/RsVPep46KRY4/#R>) 34 new
tasks survived and remain unresolved. +48; -14

---
Tally
138 issues open, as of Excellence #29
<https://phabricator.wikimedia.org/phame/post/view/228/production_excellence_29_february_2021/>
(6 Mar 2021).
-33 issues closed, of the previous 138 open issues.
+34 new issues that survived March 2021.
139 issues open, as of today (2 Apr 2021).
---Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof


Footnotes:

Incident status, Wikitech
<https://wikitech.wikimedia.org/wiki/Incident_status>.
Wikimedia incident stats by Krinkle, CodePen
<https://codepen.io/Krinkle/full/wbYMZK>.
Production Excellence: Month-over-month spreadsheet and plot
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
.
Report charts for Wikimedia-production-error project, Phabricator
<https://phabricator.wikimedia.org/project/reports/1055/>.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Production Excellence #29: February 2021

2021-03-05 Thread Krinkle
r.wikimedia.org/tag/wikimedia-production-error/> and
look for tasks that could use your help!

---
 Thanks!

Thank you to everyone else who helped by reporting, investigating, or
resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---

Footnotes:

[1] Incident status Wikitech
<https://wikitech.wikimedia.org/wiki/Incident_status>.
[2] Wikimedia incident stats by Krinkle, CodePen
<https://codepen.io/Krinkle/full/wbYMZK>.
[3] Month-over-month, Production Excellence spreadsheet
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTrUCAI10hIroYDU-i5_8s7pony8M71ATXrFRiXXV7t5-tITZYrTRLGch-3iJbmeG41ZMcj1vGfzZ70/pubhtml>
.
[4] Open tasks, Wikimedia-prod-error, Phabricator
<https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R>.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Production Excellence #28: January 2021

2021-02-18 Thread Krinkle
ould use your help!

---

 Thanks!

Thank you to everyone else who helped by reporting, investigating, or
resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---

Footnotes:

[1] Incident status Wikitech
<https://wikitech.wikimedia.org/wiki/Incident_status>.
[2] Wikimedia incident stats by Krinkle, CodePen
<https://codepen.io/Krinkle/full/wbYMZK>.
[3] Month-over-month, Production Excellence spreadsheet
<https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit>
.
[4] Open tasks, Wikimedia-prod-error, Phabricator
<https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R>.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l]  Fresh 21.01.1 released!

2021-02-12 Thread Krinkle
We have published a new release of Fresh. This one improves support for
".env" files with MediaWiki-Docker (thanks Željko Filipin), and adds
support Podman, a rootless Docker alternative (thanks Kunal Mehta).

Get started by installing or updating, and learn more, at:
https://gerrit.wikimedia.org/g/fresh#fresh-environment

Changelog:
https://gerrit.wikimedia.org/g/fresh/+/21.01.1/CHANGELOG.md


Fresh is a fast way to launch isolated environments from your terminal.
These can be used to more securely work with 'npm' developer tools such as
for ESLint, QUnit, Grunt, and Selenium. Example guides:
* https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing
*
https://www.mediawiki.org/wiki/Selenium/How-to/Run_tests_targeting_MediaWiki-Docker_using_Fresh
*
https://www.mediawiki.org/wiki/Selenium/How-to/Run_tests_targeting_MediaWiki-Docker-Dev

-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Is preprocessDump.php still useful?

2021-02-04 Thread Krinkle
Drafted removal at:
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/659133

-- Timo


On Thu, Jan 21, 2021 at 1:39 AM Krinkle  wrote:

> This script was created in 2011 and takes an offline XML dump file,
> containing page content wikitext, and feeds its entries through the
> Preprocessor without actually importing any content into the wiki.
>
> The documented purpose of the script is to "get statistics" or "fill the
> cache". I was unable to find any stats being emitted. I did find that the
> method called indeed fills "preprocess-hash" cache keys, which have a TTL
> of 24 hours (e.g. Memcached).
>
> I could not think of a use case for this and am wondering if anyone
> remembers its original purpose and/or knows of a current need for it.
>
> -- Timo
>
> [1] First commit: http://mediawiki.org/wiki/Special:Code/MediaWiki/80466
> [2] Commit history:
> https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+log/1.35.0/maintenance/preprocessDump.php
>
> PS: This came up because there's a minor refactor proposed to the script,
> and I was wondering how to test it and whether we it makes sense to
> continue maintenance and support for it.
> https://gerrit.wikimedia.org/r/641323
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Production Excellence #27: December 2020

2021-02-03 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to
find out!

Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/219/

   Incidents

1 documented incident in December. [1] In previous years, December
typically had 4 or fewer documented incidents. [3]

Learn about recent incidents at Incident documentation
<https://wikitech.wikimedia.org/wiki/Incident_documentation> ]] on
Wikitech, or Preventive measures
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator.

---

   Trends

Figure 1, Figure 2: Unresolved error reports stacked by month.
<https://phabricator.wikimedia.org/phame/post/view/219/production_excellence_27_december_2020/#trends>

Month-over-month plots based on spreadsheet data
<https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing>.
[5]

---

   Outstanding errors

Take a look at the workboard and look for tasks that could use your help.
→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Summary over recent months:

   - ⚠️ July 2019 (2 of 18 issues left): *no change*.
   - ⚠️ August 2019 (1 of 14 issues): *no change*.
   - ⚠️ September 2019 (2 of 12 issues): One task resolved (-1).
   - ⚠️ October 2019 (5 of 12 issues): *no change*.
   - ⚠️ November 2019 (1 of 5 issues): *no change*.
   - ⚠️ December 2019 (4 of 9 issues), *no change*.
   - ⚠️ January 2020 (2 of 7 issues), *no change*.
   - February 2020 (2 of 7 issues left), *no change*.
   - March 2020 (2 of 2 issues left), *no change*.
   - April 2020 (9 of 14 issues left): *no change*.
   - May 2020 (7 of 14 issues left): *no change*.
   - June 2020 (7 of 14 issues left): *no change*.
   - July 2020 (9 of 24 new issues
   <https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R>): *no
   change*.
   - August 2020 (23 of 53 new issues
   <https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R>): *no
   change*.
   - September 2020 (13 of 33 new issues
   <https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R>):
   One task resolved (-1).
   - October 2020 (35 of 69 new issues
   <https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R>):
   Four issues fixed (-4).
   - November 2020 (14 of 38 new issues
   <https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R>):
   Five issues fixed (-5).
   - *December 2020*: 22 of 33 new issues
   <https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R>
   survived the month and remained unresolved (+33; -22)


Recent tally
149 as of Excellence #26
<https://phabricator.wikimedia.org/phame/post/view/218/production_excellence_26_november_2020/>
(15 Dec 2020).
-11 closed of the 149 recent issues.
+22 new issues survived December 2020.
160 as of 27 Jan 2020.

---

   Thanks!

Thank you to everyone else who helped by reporting, investigating, or
resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---

Footnotes:

[1] Incident documentation 2020, Wikitech
<https://wikitech.wikimedia.org/wiki/Incident_documentation#2021>.
[2] Open tasks, Wikimedia-prod-error, Phabricator
<https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R>.
[3] Wikimedia incident stats by Krinkle, CodePen
<https://codepen.io/Krinkle/full/wbYMZK>.
[4] Month-over-month, Production Excellence spreadsheet
<https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit>
.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] TechCom digest 2021-01-13

2021-01-26 Thread Krinkle
The minutes from TechCom's triage meeting on 13 January 2020.

Present: Dan A, Daniel K, Tim S., Tim T.
RFC: Introduce PageIdentity [Approved]

   -

   https://phabricator.wikimedia.org/T208776
   -

   Last Call is closed, approved.

RFC: Normalize MediaWiki link tables

   -

   https://phabricator.wikimedia.org/T24
   -

   Amir shows that the enwiki-pagelinks table has surpassed the size of the
   revisions table.
   -

   Jaime shows that (post-MCR compression) the revision tables have gotten
   considerably smaller, and that on Commons the image/oldimage tables are
   even bigger than pagelinks and pose a bigger operational risk (since link
   data can be regenerated). He recommends executing the oldimage migration,
   which was approved in 2017 as part of RFC T28741
   .

RFC: Drop support for older database upgrades [Last Call]

   -

   https://phabricator.wikimedia.org/T259771
   -

   Remains on Last Call until next week. Are people quietly excited?

RFC: Stable interface policy, Nov 2020 amendment [Last Call]

   -

   https://phabricator.wikimedia.org/T268326
   -

   Remains on Last Call for one more week.


Next week IRC office hours

No IRC discussion scheduled for next week.


You can also find our meeting minutes at
https://www.mediawiki.org/wiki/Wikimedia_Technical_Committee/Minutes


-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Is preprocessDump.php still useful?

2021-01-20 Thread Krinkle
This script was created in 2011 and takes an offline XML dump file,
containing page content wikitext, and feeds its entries through the
Preprocessor without actually importing any content into the wiki.

The documented purpose of the script is to "get statistics" or "fill the
cache". I was unable to find any stats being emitted. I did find that the
method called indeed fills "preprocess-hash" cache keys, which have a TTL
of 24 hours (e.g. Memcached).

I could not think of a use case for this and am wondering if anyone
remembers its original purpose and/or knows of a current need for it.

-- Timo

[1] First commit: http://mediawiki.org/wiki/Special:Code/MediaWiki/80466
[2] Commit history:
https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/core/+log/1.35.0/maintenance/preprocessDump.php

PS: This came up because there's a minor refactor proposed to the script,
and I was wondering how to test it and whether we it makes sense to
continue maintenance and support for it.
https://gerrit.wikimedia.org/r/641323
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Test runner Quibble 0.0.46 released

2021-01-13 Thread Krinkle


Thanks Kosta and Antoine. Very exciting.

Follow the roll out of Apache vs PHP dev server, at:
https://phabricator.wikimedia.org/T225218

-- Timo


On Thu, Jan 7, 2021 at 7:55 PM Antoine Musso  wrote:

> Hello,
>
> I am pleased to announce the release of Quibble 0.0.46 mainly driven by
> Adam Wight && Kosta Harlan.
>
>
> The major feature is support for using an external web server such as
> Apache. The php builtin server driven by Quibble serves requests
> serially and does not offer all the customization Apache can do.
>
> The source repository has an example Dockerfile that leverage the use of
> supervisord to spawn Apache and point Quibble to it. We will roll that
> system to the CI jobs progressively over the next few weeks.
>
> The journey started when Kosta benchmarked php vs Apache and by serving
> requests in parallel we have already addressed issues found in MediaWiki
> test suites.
>
> Python 3.8 is officially supported, 3.4 or earlier are no more tested
> and if still using those you should really upgrade.
>
> Running under podman (a daemonless alternative to docker) is now
> recognized as a container environment (thanks Marius Hoch).
>
>
> Doc: https://doc.wikimedia.org/quibble/
> Changelog: https://doc.wikimedia.org/quibble/changelog.html
> Source: https://gerrit.wikimedia.org/g/integration/quibble/
> Bug/features
> :
> #quibble tag in Phabricator
>
> Quibble introduction: https://phabricator.wikimedia.org/J99
>
> cheers,
>
> --
> Antoine "hashar" Musso
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Production Excellence #26: November 2020

2020-12-15 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to
find out!

Or read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/218/

 Incidents

Zero documented incidents in November. [1] That's the only month this year
without any (publicly documented) incidents. In 2019, November was also the
only such month. [3]

Learn about recent incidents at Incident documentation
<https://wikitech.wikimedia.org/wiki/Incident_documentation> on Wikitech,
or Preventive measures
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator.

---

 Trends

Figure 1, Figure 2: Unresolved error reports stacked by month.
<https://phabricator.wikimedia.org/phame/post/view/218/production_excellence_26_november_2020/#trends>

The overall increase in errors was relatively low this past month, similar
to the November-December period last year.

What's new is that we can start to see a positive trend emerging in the
backlogs where we've shrunk issue count three months in a row, from the 233
high in October, down to the 181 we have in the ol' backlog today.

Month-over-month plots based on spreadsheet data
<https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing>.
[5]

---

 Outstanding errors

Take a look at the workboard and look for tasks that could use your help.
→ https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Summary over recent months:

   - ⚠️ July 2019 (2 of 18 tasks): One task closed (-1).
   - ⚠️ August 2019 (1 of 14 tasks): *no change*.
   - ⚠️ September 2019 (3 of 12 tasks): *no change*.
   - ⚠️ October 2019 (5 of 12 tasks): *no change*.
   - ⚠️ November 2019 (1 of 5 tasks): *no change*.
   - ⚠️ December 2019 (3 of 9 tasks left), *no change*.
   - January 2020 (3 of 7 tasks left), One task closed (-1).
   - February (2 of 7 tasks left), *no change*.
   - March (2 of 2 tasks left), *no change*.
   - April (9 of 14 tasks left): *no change*.
   - May (7 of 14 tasks left): *no change*.
   - June (7 of 14 tasks left): *no change*.
   - July 2020 (9 of 24 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R>): *no
   change*.
   - August 2020 (23 of 53 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R>):
   Three tasks closed (-3).
   - September 2020 (14 of 33 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R>):
   One task closed (-1).
   - October 2020 (39 of 69 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R>):
   Six tasks closed (-6).
   - *November 2020*: 19 of 38 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R>
   survived the month and remain open today (+38; -19)

Recent tally
142 as of Excellence #25
<https://phabricator.wikimedia.org/phame/post/view/213/production_excellence_25_october_2020/>
(23 Oct 2020).
-12 closed of the 142 recent tasks.
+19 survived November 2020.
149 as of today, 15 Dec 2020.

The on-going month of December, has 19 unresolved tasks
<https://phabricator.wikimedia.org/maniphest/query/10NQy74iKaZJ/#R> so far.

---

 Thanks!

Thank you to everyone else who helped by reporting, investigating, or
resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---

Footnotes:

[1] Incident documentation 2020, Wikitech
<https://wikitech.wikimedia.org/wiki/Incident_documentation#2020>.
[2] Open tasks, Wikimedia-prod-error, Phabricator
<https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R>.
[3] Wikimedia incident stats, Krinkle, CodePen
<https://codepen.io/Krinkle/full/wbYMZK>.
[4] Month-over-month, Production Excellence (spreadsheet)
<https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit>
.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] TechCom meeting 2020-11-25

2020-11-26 Thread Krinkle
The minutes from TechCom's triage meeting on 2020-11-25.

Present:  Tim S, Dan A, Daniel K, Timo T.
RFC: Expiring watchlist entries

   -

   https://phabricator.wikimedia.org/T124752
   -

   Last Call to approve is now closed.

RFC: Amendment to the Stable interface policy (Nov 2020)

   -

   https://phabricator.wikimedia.org/T268326
   -

   New RFC filed by Daniel.

General ParserCache service class

   -

   https://phabricator.wikimedia.org/T227776
   -

   Addshore asking for an update.
   -

   Daniel is thinking of withdrawing this idea for now. Might not be
   necessary anymore.

Next week IRC office hours

No IRC discussion scheduled for next week.


You can also find our meeting minutes at
https://www.mediawiki.org/wiki/Wikimedia_Technical_Committee/Minutes


-- Timo



On Wed, Nov 25, 2020 at 8:59 PM Krinkle  wrote:

> This is the weekly TechCom board review in preparation of our meeting on
> Wednesday. If there are additional topics for TechCom to review, please let
> us know by replying to this email. However, please keep discussion about
> individual RFCs to the Phabricator tickets.
>
> Activity since Monday 2020-11-02 on the following boards:
>
> https://phabricator.wikimedia.org/tag/techcom/
> https://phabricator.wikimedia.org/tag/techcom-rfc/
>
> Committee inbox:
>
>- T268328: Automatically index extensions in Codesearch
><https://phabricator.wikimedia.org/T268328>
>   - Daniel is raising that people effectively use Codesearch to guide
>   deprecation efforts under theStable Interface policy. As such, we should
>   define what inclusion criteria it has (or should have), and simply or
>   document how to implement that in practice through adding and removing
>   repositories from its index (esp those not hosted by Wikimedia).
>- T267085 <https://phabricator.wikimedia.org/T267085>: Clarify
>deprecation of method overrides
>   - A question about the stable interface policy.
>
> Committee board activity:
>
>- T175745 <https://phabricator.wikimedia.org/T175745>: Do not
>overwrite edits when conflicting with self
>   - Some renewed interest on this question about how MW should handle
>   when e.g. someone starts editing the same page from multiple tabs and 
> then
>   submits those edits.
>- T227776 <https://phabricator.wikimedia.org/T227776>: General
>ParserCache service class
>   - Addshore asking for an update.
>
> New RFCs:
>
>- T268326: RFC: Amendment to the Stable interface policy (Nov 2020)
><https://phabricator.wikimedia.org/T268326>
>   - Proposal by Daniel, to:
>  - … fill some gaps (e.g. traits, and member fields).
>  - … allow for removal without (released) deprecation if it is
>  unused in code we know about and is considered "maintained". Input 
> welcome.
>
> Phase progression:
>
>- T266866 RFC <https://phabricator.wikimedia.org/T266866>: Bump Basic
>browser support to require TLS 1.2 for MediaWiki core
>   - Ed lists which Web APIs and other browser capabilities would
>   become safe to use in the base layer (HTML/CSS), as well as some JS
>   features that will automatically become available to Grade A.
>   - Ed confirmed TLS 1.2 mapping to browser names/versions.
>   - Moved to Phase 3: Explore.
>- T260330 RFC: PHP microservice for containerized shell
><https://phabricator.wikimedia.org/T260330>
>   - Moved to Last Call last week, until 2 December (next week).
>   - Tim answered and added a section to clarify the backwards
>   compatible nature of the PHP interface in core, for third-parties that
>   would not or have not installed Shellbox.
>- T259771: RFC: Drop support for database upgrade older than two LTS
><https://phabricator.wikimedia.org/T259771>
>   - Last week's concerns about detection and failure prevention have
>   been answered by Amir.
>   - The Platform Engineering Team has filled the ownership gap for
>   this policy.
>   - Moved to Phase 4: Tune.
>
> IRC meeting request:
>
>
>- Later today (Wed 25 Nov), this RFC will be discussed in
>#wikimedia-office on Freenode IRC:
>RFC: Provide mechanism for configuration sets for development and tests
>https://phabricator.wikimedia.org/T267928
>
>
> Other RFC activity:
>
>- T263841 RFC <https://phabricator.wikimedia.org/T263841>: Expand API
>title generator to support other generated data
>   - Rescoped from potential software change to policy update.
>   - Awaiting resourcing from core API steward to confirm support,
>   risk, compatibility as proposed.
>- T250406 RFC: 

[Wikitech-l] TechCom meeting 2020-11-25

2020-11-25 Thread Krinkle
This is the weekly TechCom board review in preparation of our meeting on
Wednesday. If there are additional topics for TechCom to review, please let
us know by replying to this email. However, please keep discussion about
individual RFCs to the Phabricator tickets.

Activity since Monday 2020-11-02 on the following boards:

https://phabricator.wikimedia.org/tag/techcom/
https://phabricator.wikimedia.org/tag/techcom-rfc/

Committee inbox:

   - T268328: Automatically index extensions in Codesearch
   
  - Daniel is raising that people effectively use Codesearch to guide
  deprecation efforts under theStable Interface policy. As such, we should
  define what inclusion criteria it has (or should have), and simply or
  document how to implement that in practice through adding and removing
  repositories from its index (esp those not hosted by Wikimedia).
   - T267085 : Clarify
   deprecation of method overrides
  - A question about the stable interface policy.

Committee board activity:

   - T175745 : Do not overwrite
   edits when conflicting with self
  - Some renewed interest on this question about how MW should handle
  when e.g. someone starts editing the same page from multiple
tabs and then
  submits those edits.
   - T227776 : General
   ParserCache service class
  - Addshore asking for an update.

New RFCs:

   - T268326: RFC: Amendment to the Stable interface policy (Nov 2020)
   
  - Proposal by Daniel, to:
 - … fill some gaps (e.g. traits, and member fields).
 - … allow for removal without (released) deprecation if it is
 unused in code we know about and is considered "maintained".
Input welcome.

Phase progression:

   - T266866 RFC : Bump Basic
   browser support to require TLS 1.2 for MediaWiki core
  - Ed lists which Web APIs and other browser capabilities would become
  safe to use in the base layer (HTML/CSS), as well as some JS
features that
  will automatically become available to Grade A.
  - Ed confirmed TLS 1.2 mapping to browser names/versions.
  - Moved to Phase 3: Explore.
   - T260330 RFC: PHP microservice for containerized shell
   
  - Moved to Last Call last week, until 2 December (next week).
  - Tim answered and added a section to clarify the backwards
  compatible nature of the PHP interface in core, for third-parties that
  would not or have not installed Shellbox.
   - T259771: RFC: Drop support for database upgrade older than two LTS
   
  - Last week's concerns about detection and failure prevention have
  been answered by Amir.
  - The Platform Engineering Team has filled the ownership gap for this
  policy.
  - Moved to Phase 4: Tune.

IRC meeting request:


   - Later today (Wed 25 Nov), this RFC will be discussed in
   #wikimedia-office on Freenode IRC:
   RFC: Provide mechanism for configuration sets for development and tests
   https://phabricator.wikimedia.org/T267928


Other RFC activity:

   - T263841 RFC : Expand API
   title generator to support other generated data
  - Rescoped from potential software change to policy update.
  - Awaiting resourcing from core API steward to confirm support, risk,
  compatibility as proposed.
   - T250406 RFC: Hybrid extension management
   
  - Conversation about what we would need to commit to for WMF
  software, and seeking placing and approval of said resourcing.
   - T119173: RFC: Discourage use of MySQL ENUM type
   
  - Next step is for the consensus to be turned into concrete wording
  for the policy.
   - T40010: RFC: Re-evaluate librsvg as SVG renderer for WMF wikis
   
  - Some general clarifications, and statistics from production.


-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Production Excellence #25: October 2020

2020-11-23 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to
find out!

Or read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/213/

 Incidents

2 documented incidents in October. [1] Historically, that's just below the
median of 3 for this time of year. [3]

Learn about recent incidents at Incident documentation
<https://wikitech.wikimedia.org/wiki/Incident_documentation> on Wikitech,
or Preventive measures
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator.

---

 Trends

Figure 1, Figure 2: Unresolved error reports stacked by month.
<https://phabricator.wikimedia.org/phame/post/view/213/production_excellence_25_october_2020/#trends>

Month-over-month plots based on spreadsheet data. [5]

---

 Outstanding errors

Take a look at the workboard and look for tasks that could use your help.
→ https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Summary over recent months:

   - ⚠️ July 2019 (3 of 18 tasks): One task closed.
   - ⚠️ August 2019 (1 of 14 tasks): *no change*.
   - ⚠️ September 2019 (3 of 12 tasks): *no change*.
   - ⚠️ October 2019 (5 of 12 tasks): One task closed.
   - ⚠️ November 2019 (1 of 5 tasks): Two tasks closed.
   - December (3 of 9 tasks left), *no change*.
   - January 2020 (4 of 7 tasks left), *no change*.
   - February (2 of 7 tasks left), *no change*.
   - March (2 of 2 tasks left), *no change*.
   - April (9 of 14 tasks left): One task closed.
   - May (7 of 14 tasks left): *no change*.
   - June (7 of 14 tasks left): *no change*.
   - July 2020 (9 of 24 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R>):
   One task closed.
   - August 2020 (26 of 53 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R>):
   Five tasks closed.
   - September 2020 (15 of 33 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R>):
   Two tasks closed.
   - *October 2020*: 45 of 69 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/MYnnBybPTYpd/#R>
   survived the month of October and remain open today.

Recent tally
110 as of Excellence #24
<https://phabricator.wikimedia.org/phame/post/view/205/production_excellence_24_september_2020/>
(23rd Oct).
-13 closed of the 110 recent tasks.
+45 survived October 2020.
142 as of today, 23rd Nov.

For the on-going month of November, there are 25 new tasks
<https://phabricator.wikimedia.org/maniphest/query/CkC_VqQq5VC0/#R> so far.

---

 Thanks!

Thank you to everyone else who helped by reporting, investigating, or
resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---

Footnotes:

[1] Incident documentation 2020, Wikitech
<https://wikitech.wikimedia.org/wiki/Incident_documentation#2019>
[2] Open tasks in Wikimedia-prod-error, Phabricator
<https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R>
[3] Wikimedia incident stats by Krinkle, CodePen
<https://codepen.io/Krinkle/full/wbYMZK>
[4] Month-over-month, Production Excellence (spreadsheet)
<https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] TechCom meeting 2020-11-18

2020-11-18 Thread Krinkle
The minutes from TechCom's triage meeting on 18 November 2020.

Present: Tim S, Daniel K, Timo T.
New RFC: Provide mechanism for overriding configuration for browser tests

   - https://phabricator.wikimedia.org/T267928
   - TT: High time we resource this. Some previous research on this when we
   transitioned browser tests from Ruby to Node/WebdriverIO. At the time, we
   wanted to keep the ability to run the same tests against local+CI+beta,
   which made this rather difficult.
   - Moved to P2.

RFC: Discourage use of MySQL's ENUM type

   - https://phabricator.wikimedia.org/T119173
   - DK: yes discourage by default
   - TS: Jaime mentioned that ENUM's sort differently from text, but also
   said we shouldn't ban it outright.
   - TT: as proposed sounds right. generally there are better solutions,
   but as justified optimization specific high-scale uses could be allowed.

RFC: Drop support for database upgrade older than two LTS releases

   - https://phabricator.wikimedia.org/T259771
   - TT: principally seems fine, not aware of concerns. we'd want to make
   sure we cover the failure scenarios, e.g. not just soft documentation, but
   actually programmatically detected and prevent disaster. I'll comment
   on-task.
   - DK: Platform team as stakeholder for ..?
   - TT: I guess potential veto in terms of what the minimum support should
   be, and if okay with trailing/dropping, then how long it has to be.

RFC: Expiring watch list entries

   - https://phabricator.wikimedia.org/T262946
   - Last Call ended. Approved.

RFC: Shellbox microservice for MediaWiki

   - https://phabricator.wikimedia.org/T260330
   - TT: Worth noting that it is an optional service. The current logic
   remains the same as before and Shell-exec call API also remains compatible.
   The library can effectively now be put into a container and MW configured
   to use that rather than calling directly.
   - Put on Last Call until 2 December.

Next week IRC office hoursNo IRC discussion scheduled for next week.


You can also find our meeting minutes at
https://www.mediawiki.org/wiki/Wikimedia_Technical_Committee/Minutes

If you prefer you can subscribe to our newsletter here
https://www.mediawiki.org/wiki/Newsletter:TechCom_Radar

-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] TechCom topics 2020-11-04 (fixed)

2020-11-10 Thread Krinkle
On Tue, Nov 10, 2020 at 5:50 PM Gergo Tisza  wrote:

> On Tue, Nov 3, 2020 at 1:59 AM Daniel Kinzler 
> wrote:
>
>> TemplateData already uses JSON serialization, but then compresses the
>> JSON output, to make the data fit into the page_props table. This results
>> in binary data in ParserOutput, which we can't directly put into JSON.
>
>
> I'm not sure I understand the problem. Binary data can be trivially
> represented as JSON, by treating it as a string. Is it an issue of storage
> size? JSON escaping of the control characters is (assuming binary data with
> a somewhat random distribution of bytes) an ~50% size increase, UTF-8
> encoding the top half of bytes is another 50%, so it will approximately
> double the length - certainly worse than the ~33% increase for base64, but
> not tragic. (And if size increase matters that much, you probably shouldn't
> be using base64 either.)
>

The binary aspect here refers to the gzip output buffer. While these are
represented in PHP as a string, the string is not encodable as UTF-8 or
indeed as JSON. Attempting to do so results in a PHP json error with
boolean false returned.

Condensed example: https://3v4l.org/cJttU
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] TechCom topics 2020-11-04 (fixed)

2020-11-05 Thread Krinkle
On Thu, 5 Nov 2020 at 18:35, Dan Andreescu  wrote:

> On Tue, Nov 3, 2020 at 4:38 AM Daniel Kinzler 
> wrote:
>
>> Am 02.11.20 um 19:24 schrieb Daniel Kinzler:
>>
>> T262946  *"Bump Firefox
>> version in basic support to 3.6 or newer"*: last call ending on
>> Wednesday, November 4. Some comments, no objections.
>>
>>
>> Since we are not having a meeting on Wednesday, I guess we should try and
>> get quorum to approve by mail.
>>
>> I'm in favor.
>>
> +1
>


LGMT3.

-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] TechCom topics 2020-11-04 (fixed)

2020-11-03 Thread Krinkle
*RFC: Expiring watch list entries*
https://phabricator.wikimedia.org/T124752

This just missed the triage window, but it looks like this was implemented
and deployed meanwhile (it was in Phase 3). I'm proposing we put this on
Last Call for wider awareness and so that the team can answer any questions
people might have, and to address any concerns that people might have based
on reviewing the proposal we now know the team wanted/has chosen.

-- Timo

On Mon, Nov 2, 2020 at 6:24 PM Daniel Kinzler 
wrote:

> [Re-posting with fixed links. Thanks for pointing this out Cormac!]
>
> This is the weekly TechCom board review.  Remember that there is no
> meeting on Wednesday, any discussion should happen via email. For
> individual RFCs, please keep discussion to the Phabricator tickets.
>
> Activity since Monday 2020-10-26 on the following boards:
>
> https://phabricator.wiki09media.org/tag/techcom/
> 
>
> https://phabricator.wikimedia.org/tag/techcom-rfc/
>
> Committee board activity:
>
>-
>
>T175745  *"overwrite edits
>when conflicting with self"* has once again come up while working on
>EditPage. There seems to no longer be any reason for this behavior. I think
>it does more harm then good. We should just remove it.
>
> RFCs:
>
> Phase progression:
>
>- T266866  *"Bump basic
>supported browsers (grade C) to require TLS 1.2"*: newly filed, lively
>discussion. Phase 1 for now.
>
>
>-
>
>T263841  *"Expand API title
>generator to support other generated data"*: dropped back to phase 2
>because resourcing is unclear.
>- T262946  *"Bump Firefox
>version in basic support to 3.6 or newer"*: last call ending on
>Wednesday, November 4. Some comments, no objections.
>
>
> Other RFC activity:
>
>- T250406  *"Hybrid
>extension management"*: Asked for clarification expectations for WMF
>to publish extensions to packagist. Resourcing is being discussed in the
>platform team.
>
> Cheers,
> Daniel
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Making breaking changes without deprecation?

2020-11-03 Thread Krinkle
On Thu, Sep 3, 2020 at 7:26 AM Robert Vogel  wrote:

> For the BlueSpice distribution ...
>
> * we have got ~90 active repos hosted on WMF Gerrit and another ~10 in our
> internal Gitlab
>
> * we want to develop as much as possible on the public infrastructure of
> the WMF, so the remaining internal repos will (hopefully) be published in
> the future
>

Note that any public Git repo would suffice for the purposes of this
proposal.
The Codesearch tool mainly used for this in practice, already indexes
various third-party hosted Git repositories,
including some hosted on GitHub.com, for example.
https://codesearch.wmcloud.org/search/

You can file a task for additional repos to be indexed:
https://phabricator.wikimedia.org/tag/vps-project-codesearch/


-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Production Excellence #24: September 2020

2020-10-23 Thread Krinkle
 Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/205/
---

How’d we do in our strive for operational excellence last month? Read on to
find out!

 *Incidents*

5 documented incidents. [1] Historically, that's right on average for the
time of year. [3]

For more about recent incidents see Incident documentation
<https://wikitech.wikimedia.org/wiki/Incident_documentation> on Wikitech,
or Preventive measures
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator.

---

 * Trends*

For month-over-month plots, see this spreadsheet
<https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing>.
[5]

---

 * Outstanding errors*

Take a look at the workboard and look for tasks that could use your help.
→ https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Summary over recent months:

   - ⚠️ July 2019 (4 of 18 tasks left): *no change*.
   - ⚠️ August 2019 (1 of 14 tasks left): *no change*.
   - ⚠️ September 2019 (3 of 12 tasks left): *no change*.
   - ⚠️ October (6 of 12 tasks left), *no change*.
   - November (3 of 5 tasks left): *no change*.
   - December (3 of 9 tasks left), *no change*.
   - January 2020 (4 of 7 tasks left), One task closed.
   - February (2 of 7 tasks left), *no change*.
   - March (2 of 2 tasks left), *no change*.
   - April (10 of 14 tasks left): *no change*.
   - May (7 of 14 tasks left): *no change*.
   - June (7 of 14 tasks left): Three tasks closed.
   - July 2020 (10 of 24 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R>):
   Three tasks closed.
   - August 2020 (31 of 53 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R>):
   Six tasks closed.
   - *September 2020*: 17 of 33 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/CGFQViLShnOY/#R>
   survived the month of September and remain open today.

Recent tally:
106 as of Excellence #23
<https://phabricator.wikimedia.org/phame/post/view/204/production_excellence_23_july_august_2020/>
(Sep 23rd).
-13 closed of the 106 recent tasks.
+17 survived September 2020.
110 as of today, Oct 23rd.

Previously
<https://phabricator.wikimedia.org/phame/post/view/204/production_excellence_23_july_august_2020/>,
we had 106 unresolved production errors from the recent months up to
August. Since then, 13 of those were closed. But, the 18 errors surviving
September raise our recent tally to 110.

The workboard overall (including errors from 2019 and earlier) holds 343
open tasks in total, an increase of +47 compared to the 296 total on Sept
23rd.

---
 *Thanks! *

Thank you to everyone else who helped by reporting, investigating, or
resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof
---

Footnotes:

[1] Incidents. –
https://wikitech.wikimedia.org/wiki/Incident_documentation#2020
[2] Open tasks. –
https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R
[3] Wikimedia incident stats. – https://codepen.io/Krinkle/full/wbYMZK
[4] Month-over-month plots. –
https://docs.google.com/spreadsheets/d/1tRCh8aB0UYyLlhftvcHvhWH4-e7cF5V01XvRObTVgUI/edit?usp=sharing
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] TechCom meeting 2020-10-21

2020-10-21 Thread Krinkle
The minutes from TechCom's triage meeting on 21 October 2020.

Present: Dan A, Tim S, Timo T.
Should npm packages maintained by Wikimedia be scoped or unscoped?

   -

   https://phabricator.wikimedia.org/T239742
   -

   NL: Still in inbox
   -

   TT: Seems companies do this both ways at times. Might not need a strict
   policy.
   -

   TS: Delegate to FSG?
   -

   TT: I'll ask Volker to bring it up in the next meeting.
   -

   Moved to Watching.

RFC: Bump Firefox version in basic support to 3.6 or newer

   -

   https://phabricator.wikimedia.org/T262946
   -

   Moved to Last Call.

RFC: PHP microservice for containerized shell execution

   -

   https://phabricator.wikimedia.org/T260330
   -

   Development is going right along. The basics have been figured out.
   -

   Still room for input on what the public interface of the MW service
   class would look like, and how e.g. an admin would write or generate the
   configuration for their Shellbox instance for the specific extensions they
   have  (E.g. Score requires additional commands to be registered  in the
   service).
   -

   Currently in P4. We should hear from e.g. Fandom and BlueSpice with any
   concerns to consider, or a confident signal that they're happy with this as
   it stands.

Next week IRC office hours

No IRC discussion scheduled for next week.


You can also find our meeting minutes at
https://www.mediawiki.org/wiki/Wikimedia_Technical_Committee/Minutes

If you prefer you can subscribe to our newsletter here

https://www.mediawiki.org/wiki/Newsletter:TechCom_Radar


-- Timo

On Mon, Oct 19, 2020 at 6:12 PM Niklas Laxström 
wrote:

> This is the weekly TechCom board review in preparation of our meeting on
> Wednesday. If there are additional topics for TechCom to review, please let
> us know by replying to this email. However, please keep discussion about
> individual RFCs to the Phabricator tickets.
>
> Activity since Monday 2020-10-15 on the following boards:
>
> https://phabricator.wikimedia.org/tag/techcom/
> https://phabricator.wikimedia.org/tag/techcom-rfc/
>
> Committee inbox:
>
>- T239742  Should npm
>packages maintained by Wikimedia be scoped or unscoped?
>   - Still in inbox
>
> Committee board activity:
>
>- T263904  Are traits part
>of the stable interface?
>   - Daniel moved to in progress (see last weeks email thread)
>
> New RFCs: (none)
>
> Phase progression:
>
>- T262946 Bump Firefox version in basic support to 3.6 or newer
>- P3 -> P4: ready to go on last call
>
> IRC meeting request: (none)
>
> Other RFC activity:
>
>- T119173 RFC: Discourage use of MySQL's ENUM type
>   - Concerns about removing existing uses, but it doesn't seem
>   necessary to remove them
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Production Excellence #23: July & August 2020

2020-09-23 Thread Krinkle
 Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/204/
---

How’d we do in our strive for operational excellence last month? Read on to
find out!

##    *Incidents*

4 documented incidents in July, and 2 documented incidents in August. [1]
Historically, that's on average for this time of year. [5]

For more about recent incidents see Incident documentation
<https://wikitech.wikimedia.org/wiki/Incident_documentation> on Wikitech,
or Preventive measures
<https://phabricator.wikimedia.org/project/view/4758/> in Phabricator.

---

##    *Tre**nds*

Take a look at the workboard and look for tasks that could use your help.
→ https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Summary over recent months:

   - ⚠️ July 2019 (4 of 18 tasks left): One task closed.
   - ⚠️ August 2019 (1 of 14 tasks left): *no change.*
   - ⚠️ September 2019 (3 of 12 tasks left): Two tasks closed.
   - October (6 of 12 tasks left), *no change.*
   - November (3 of 5 tasks left): *no change.*
   - December (3 of 9 tasks left), Two tasks closed.
   - January 2020 (5 of 7 tasks lef), *no change.*
   - February (2 of 7 tasks left), Two tasks closed.
   - March (2 of 2 tasks left), *no change.*
   - April (10 of 14 tasks left): One task closed.
   - May (7 of 14 tasks left): Four tasks closed.
   - June (10 of 14 tasks left): Four tasks closed.
   - July 2020: 13 of 24 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/s__D8Kd0xuQH/#R>
   survived the month of July and remain open today.
   - August 2020: 37 of 53 new tasks
   <https://phabricator.wikimedia.org/maniphest/query/hu1yhWu4sXkP/#R>
   survived the month of August and remain open today.

Recent tally:
72  open, as of Excellence #22
<https://phabricator.wikimedia.org/phame/post/view/203/production_excellence_22_june_2020/>
(Jul 23rd).
-16  closed, of the previous 72 recent tasks.
+13  opened and survived July 2020.
+37  opened and survived August 2020.
106  open, as of today (Sep 23rd).

Previously
<https://phabricator.wikimedia.org/phame/post/view/203/production_excellence_22_june_2020/>,
we had 72 open production errors over the recent months up to June. Since
then, 16 of those were closed. But, the 13 and 37 errors surviving July and
August raise our recent tally to 106.

The workboard overall (including tasks from 2019 and earlier) held 192 open
production errors on July 23rd. As of writing, the workboard holds 296 open
tasks in total. [4] This +104 increase is largely due to the merged backlog
of JavaScript client errors, which were previously untracked. Note that we
backdated the majority of these JS errors under “Old”, and thus are not
amongst the elevated numbers of July and August.

---

   *Thanks!*

Thank you to everyone else who helped by reporting, investigating, or
resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---


Footnotes:
[1] Incidents. – https://wikitech.wikimedia.org/wiki/Incident_documentation
<https://wikitech.wikimedia.org/wiki/Incident_documentation#2020>
[2] Tasks created. – https://phabricator.wikimedia.org/maniphest/query…
<https://phabricator.wikimedia.org/maniphest/query/JuSOycDOe.7R/#R>
[3] Tasks closed. – https://phabricator.wikimedia.org/maniphest/query…
<https://phabricator.wikimedia.org/maniphest/query/aKNrCHMosori/#R>
[4] Open tasks. – https://phabricator.wikimedia.org/maniphest/query…
<https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R>
[5] Wikimedia incident stats. – https://codepen.io/Krinkle/full/wbYMZK
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] "Library" for sharing useful resources for learning technical topics

2020-09-21 Thread Krinkle
Thanks for starting this Amir!

I've merged some of my recommendations from [[mw:JSTIPS]] into it, and
redirected that to your library going forward
The collection is growing quite nicely at
https://www.mediawiki.org/wiki/Library. Check it out!

-- Timo



On Tue, May 26, 2020 at 7:39 PM Amir Sarabadani  wrote:

> I moved it to https://wikitech.wikimedia.org/wiki/Library
> @Stephen Thanks. I will watch and add them.
>
> On Tue, May 26, 2020 at 8:30 PM Stephen Niedzielski <
> sniedziel...@wikimedia.org> wrote:
>
> > Regarding Vue.js, we've been aggregating some resources at
> > https://www.mediawiki.org/wiki/Vue.js. Edits welcome.
> >
> > Stephen
> >
> >
> > On Tue, May 26, 2020 at 12:24 PM Aron Demian 
> > wrote:
> > >
> > > On Tue, 26 May 2020 at 19:27, Amir Sarabadani 
> > wrote:
> > >
> > > > Instead, I have this idea to have a virtual library for developers so
> > they
> > > > can share useful resources with each other. You go to a wiki page and
> > see
> > > > list of courses, books, conference videos, on each topic and
> different
> > > > people recommanding them. You can also request a resource for a topic
> > and
> > > > people respond to you. If the wiki page grows too big, we can split
> > them to
> > > > sub pages based on topics, and so on.
> > > >
> > > > What do you think?
> > >
> > >
> > > It's definitely a great idea. Sharing is the primary purpose of the
> > > movement and I guess many of us have their own library of useful
> > resources
> > > to pick from.
> > >
> > > Demian (Aron)
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Amir (he/him)
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l]  Fresh 20.08.1 released!

2020-09-21 Thread Krinkle
I've published a minor release to Fresh. It fixes compat for SELinux
(thanks Marius Hoch), and updates the Docker image to match the latest WMF
CI with npm 6.14 and Firefox 68.11 (thanks Antoine Musso).

Getting started:
https://gerrit.wikimedia.org/g/fresh/#fresh-environment

Changelog:
https://gerrit.wikimedia.org/g/fresh/+/20.08.1/CHANGELOG.md

Fresh is a fast way to launch isolated shells from your terminal. These can
be used to more securely perform 'npm' install and run commands such as for
ESLint, QUnit, Grunt, or Selenium. Example guides:
*
https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing#Getting_started
*
https://www.mediawiki.org/wiki/Selenium/Node.js/Target_Local_MediaWiki_(Container)

-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] TechCom meeting 2020-09-16

2020-09-16 Thread Krinkle
The minutes from TechCom's triage meeting on 2020-09-16.

Present:  Daniel K, Dan A, Giuseppe L, Tim S, Timo T.

== Links Recommendations Service ==

* https://phabricator.wikimedia.org/T261411
* https://phabricator.wikimedia.org/T252822
* GL: Idea is to store a blob of wikitext for every page
* What’s best practice for storing derived data that would be invalidated
by every edit?
* TS: similar to data pipelines like page content service
* TS: similar to links tables
* DK: have some kind of “slow” page_props.
* TS: will lead to race conditions
* GL: wikitext for every page may mean considerable volume
* DK: might fit the generalized parser cache. Would have to be even more
generalized.
* DK: Dependency Engine may solve part of this, though not storage

== EventStream needs timestamps ==

* DA: EventStream needs timestamps, which some event hooks are missing.
* Pass LogEntry objects to relevant hooks?
* DK: adding parameters to hooks is a breaking interface change.
* TS: hook deprecation mechanism should take care of this.
* DA: ArticleDelete hook exposes it already, works well, getTimestamp. Can
do for others.
* TT: In light of T212482, be sure to type against a narrower getter-only
interface as the set/save methods must not be used at that point.
* DK: That’s a good idea also to enable async hooks, safe to serialize
value objects. NonSerializable trait may be useful
https://gerrit.wikimedia.org/r/c/mediawiki/core/+/625612

== Next week IRC office hours ==
No IRC discussion scheduled for next week.

On Wed, Sep 16, 2020 at 6:28 PM Daniel Kinzler 
wrote:

> This is the weekly TechCom board review in preparation of our meeting on
> Wednesday. If there are additional topics for TechCom to review, please
> let us
> know by replying to this email. However, please keep discussion about
> individual
> RFCs to the Phabricator tickets.
>
> Activity since Monday 2020-09-07 on the following boards:
>
> https://phabricator.wikimedia.org/tag/techcom/
> https://phabricator.wikimedia.org/tag/techcom-rfc/
>
> IRC meeting request:
> * Public discussion TODAY: "PHP microservice for containerized shell
> execution"
>   Join us at 21:00 UTC (23:00 CEST, 2pm PDT) in the #wikimedia-office
> channel
>   on freenode. .
>
>
> Other RFC activity:
> * "Parsoid Extension API": Subbu documented status of outreach with
>   various stakeholders. 
>
> --
> Daniel Kinzler
> Principal Software Engineer, Core Platform
> Wikimedia Foundation
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l]  Wikimedia production errors help

2020-09-16 Thread Krinkle
On Tue, Sep 15, 2020 at 10:00 AM Niklas Laxström 
wrote:

> ma 14. syysk. 2020 klo 23.49 Tyler Cipriani (tcipri...@wikimedia.org)
> kirjoitti:
> > The number of new tasks being created with this tag in a given week is
> > outpacing the number of tasks being closed in a given week: this past
> > week we added 41 tasks and only closed 22.
>
> Majority of the recently created tasks are frontend JavaScript errors.
> The logging of these errors have only started recently.


Aye, this is indeed a distraction currently. In talking with Tyler prior to
this email I failed to highlight what I think the main area of concern is,
which is indeed not just the total number of reports from this and last
month.

Rather, my main concern is that over the past six month (incl long before
the JS stuff came along), we've fallen quite a bit in addressing on-going
production errors.

For example, of the 30 odd backend errors reported in June, 14 were still
open a month later in July [1], and 12 were still open – three months later
– in September. The majority of these haven't even yet been triaged,
assigned assigned or otherwise acknowledged. And meanwhile we've got more
(non-JavaScript) stuff from July, August and September adding pressure. We
have to do better.

-- Timo

[1]
https://phabricator.wikimedia.org/phame/post/view/203/production_excellence_22_june_2020/
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] TechCom board review 2020-09-07

2020-09-11 Thread Krinkle
The minutes from TechCom's triage meeting on 2020-09-09.

Present: Daniel K, Dan A, Tim S, Timo T, Kate C.

== Liberate the @ for AtEase ==
* https://phabricator.wikimedia.org/T253461
* Checking in about next steps.
* As a coding convention, it doesn’t need our involvement unless there’s a
conflict, strategic impact, etc. Talk page?
* TT: Will reach out on the talk page as well.

== RFC: Parsoid Extension API ==
* https://phabricator.wikimedia.org/T260714
* TT: Concerning that the RFC was filed this late, but are not meaningful
comments and thinking about last call. Believe this is more perception than
an actual problem. The engagement hasn’t been captured on the task very
well. This is an initial attempt to be used for WMF maintained extensions
when get closer to this parser as the default will be more changes.
Different from usual processes, how do we accommodate that?
* DK: Establish a baseline, then will file another RFC later. Need to
clearly mark as unstable.

== RFC: PHP microservice for containerized shell execution ==
* https://phabricator.wikimedia.org/T260330
* Scheduled as IRC meeting for next week.

== Next week IRC office hours ==
RFC review meeting scheduled for next week: RFC: Containerized shell
execution service https://phabricator.wikimedia.org/T260330.
Meeting at 2020-09-10 21:00 UTC (14:00 PT, 23:00 CEST) on Freenode in the
#wikimedia-office channel.

You can also find our meeting minutes at


If you prefer you can subscribe to our newsletter here


-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Do we still need AtEase?

2020-09-09 Thread Krinkle
I've tuned the proposal based on the feedback gathered.
Please take a moment to re-review, especially if you previously had
concerns or otherwise prefer AtEase.

-- Timo


On Sun, May 24, 2020 at 6:46 PM Krinkle  wrote:

> It does the same as the @ operator, except that it takes care to prevent a
> very bad bug that existed before PHP 7. Details at
> https://phabricator.wikimedia.org/T253461
>
> If there are other issues or benefits, please write them on the task. The
> overhead of AtEase is prerty minor, so really any benefit at all is likely
> to tip the balance toward keeping it. But, in the event that there isn't
> any, then perhaps we should slowly phase it out.
>
> Best,
> -- Timo
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Extensions which host their code in-wiki

2020-08-28 Thread Krinkle
Hi Zoran,

I like this idea! For several reasons, most of which are not new.

What led you to wanting to archive them now? What are you looking to solve
or improve?

-- Krinkle

On Fri, 28 Aug 2020 at 18:41, Zoran Dori  wrote:

> Hello everyone,
>
> on MediaWiki.org I've found a category of extensions which code is stored
>
> on the pages of mentioned wiki. [1]
>
>
>
> Should I (or someone else) archive those pages, and items on Wikidata.org?
>
>
>
> Best regards,
>
> Zoran.
>
>
>
> [1]
>
>
> https://www.mediawiki.org/wiki/Category:Extensions_which_host_their_code_in-wiki
>
> ___
>
> Wikitech-l mailing list
>
> Wikitech-l@lists.wikimedia.org
>
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Production Excellence #22: June 2020

2020-07-22 Thread Krinkle
 Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/203/
---

How’d we do in our strive for operational excellence last month? Read on to
find out!

##    Month in review

* 4 documented incidents in June. [1]
* 37 new production errors were filed and 27 were closed. [2] [3]
* 72 recent production errors still open (up from 68).
* 203 total Wikimedia-prod-error tasks currently open (up from 192). [4]

For more about recent incidents and pending actionables see Wikitech and
Phabricator, at https://wikitech.wikimedia.org/wiki/Incident_documentation
and https://phabricator.wikimedia.org/project/view/4758/

---

##    Outstanding errors

Breakdown of the errors reported in June that are still open today:

* (Needs owner) / Newsletter extension: Unexpected locking SELECT query.
https://phabricator.wikimedia.org/T253926
* (Needs owner) / FlaggedRevs extension: Unable to submit review of page
due to bad fr_page_id record. https://phabricator.wikimedia.org/T256296
* Editing team / MassMessage extension: Delivery fails due to system user
conflict. https://phabricator.wikimedia.org/T171003
* Parsing team / Parsoid: Pagebundle data unavailable due to a bad UTF-8
string. https://phabricator.wikimedia.org/T236866
* Growth team / Recent changes: Update for ActiveUsers data failing due to
deadlock. https://phabricator.wikimedia.org/T255059
* Growth team / GrowthExperiments: Issue with question display on personal
homepage. https://phabricator.wikimedia.org/T255616
* Language team / Translate extension: Update jobs fail due to invalid
function call. https://phabricator.wikimedia.org/T255669
* Language team / ContentTranslation: Save action fails due to duplicate
insert query. https://phabricator.wikimedia.org/T256230
* Core Platform team / Content handling: Incompatible content type during
content merge/stash. https://phabricator.wikimedia.org/T255700
* Core Platform team / Monolog: API usage logs and error logs sometimes
missing due to socket failure. https://phabricator.wikimedia.org/T255578
* Search Platform team / WikibaseCirrus: Elevated error levels from
EntitySearchElastic warnings. https://phabricator.wikimedia.org/T255658
* Wikidata / API: Generator query fails due to invalid API result format.
https://phabricator.wikimedia.org/T254334
* Wikidata / API: EntityData query emits warning about bad RDF.
https://phabricator.wikimedia.org/T255054
* Wikidata / Repo: Entity relation update jobs fail due to deadlock.
https://phabricator.wikimedia.org/T255706

---

##    Trends

Take a look at the workboard and look for tasks that could use your help.
→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Summary over recent months:

* July 2019 (5 of 18 tasks left): Two tasks closed.
* August (1 of 14 tasks left): Another task closed, only one remaining! 
* September (5 of 12 tasks left): Two tasks closed.
* October (6 of 12 tasks left), no change.
* November (3 of 5 tasks left): Another task closed.
* December (5 of 9 tasks left), no change.
* January 2020 (5 of 7 tasks lef), no change.
* February (4 of 7 tasks left), no change.
* March (2 of 2 tasks left), no change.
* April (11 of 14 tasks left): Three tasks closed.
* May (11 tasks left): Three tasks closed.
* June: 14 new tasks survived the month of June. ⚠️

At the end of May the number of open production errors over recent months
was 68. Of those, 10 got closed, but with 14 new tasks from June still
open, the total has grown further to 72.

The workboard had 192 open tasks last month, which saw another increase, to
now 203 open tasks (this includes tasks from 2019 and earlier).

---

##    Thanks!

Thank you to everyone else who helped by reporting, investigating, or
resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---

Footnotes:

[1] Incidents. –
https://wikitech.wikimedia.org/wiki/Incident_documentation#2020

[2] Tasks created. –
https://phabricator.wikimedia.org/maniphest/query/VTpmvaJLYVL1/#R

[3] Tasks closed. –
https://phabricator.wikimedia.org/maniphest/query/qn5yeURqyl3D/#R

[4] Open tasks. –
https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] New design for Stable interface policy

2020-06-28 Thread Krinkle
Based on recent CRs and feedback I've redesigned the recently updated
Stable interface policy.

Old:
https://www.mediawiki.org/w/index.php?title=Stable_interface_policy=3845486

New:
https://www.mediawiki.org/wiki/User:Krinkle/Stable_interface_policy

I hope this is more approachable and easier to navigate from the various
perspectives and use cases where you might need it (e.g. when reviewing
code, writing code in core, writing code in an extension, planning out the
deprecation roadmap for a feature, etc.).

This page is currently a draft because during the rewrite I noticed a few
minor inconsistencies and open questions in the nominal text. These are
being fleshed out through an RFC at <
https://phabricator.wikimedia.org/T255803>. Once we've decided on those
details, I'll publish the new design to the canonical page.

-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Production Excellence #21: May 2020

2020-06-24 Thread Krinkle
 Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/198/
---

How’d we do in our strive for operational excellence last month? Read on to
find out!

##    Month in numbers

* 5 documented incidents in May. [1]
* 28 new production error tasks filed in May. [2] [3]
* 68 recent production errors currently open (up from 61).
* 193 currently open Wikimedia-prod-error tasks (up from 178). [4]

For more about recent incidents and pending actionables see Wikitech and
Phabricator, at https://wikitech.wikimedia.org/wiki/Incident_documentation
and https://phabricator.wikimedia.org/project/view/4758

##    Outstanding reports

Take a look at the workboard and look for tasks that could use your help.
→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Breakdown of recent months:

* July 2019: One task closed, 7 of 18 tasks left. ⚠️
* August: 2 of 14 tasks left (unchanged).
* September: 7 of 12 tasks left (unchanged).
* October: 4 of 12 tasks left (unchanged).
* November: 4 of 5 tasks left (unchanged).
* December: 4 of 9 tasks left (unchanged).
* January 2020: 5 of 7 tasks left (unchanged).
* February: Two tasks closed, 4 of 7 tasks left. ⚠️
* March: 2 of 2 tasks left (unchanged).
* April: 14 of 14 tasks left (unchanged).
* May: 14 new tasks survived the month of May.

At the end of April the total of open production errors over recent months
was 61. Of those, 7 got closed, but with 14 new tasks from May still open,
the total has grown to 68.

The workboard had 178 open tasks in April, which saw a steep increase to
now 192 open tasks (this includes June 2020 so far, and pre-2019 tasks).

##    Thanks!

Thank you to everyone else who helped by reporting, investigating, or
resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---

Footnotes:

[1] Incidents. –
https://wikitech.wikimedia.org/wiki/Incident_documentation#2020

[2] Tasks created. –
https://phabricator.wikimedia.org/maniphest/query/7Z4Us2BS02Uo/#R

[3] Tasks closed. –
https://phabricator.wikimedia.org/maniphest/query/FoIFMu5UO8pw/#R

[4] Open tasks. –
https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Help with memory leaks in a C# Mono application

2020-06-07 Thread Krinkle
Hi,

I'm looking for help finding the cause(s) of a memory leak in the
countervandalism CVNBot project. These bots provide users with
filtered recent-changes information to patrollers on various big wikis
(enwiki, commons, wikidata[1]) as well as from groups of small wikis
combined (SWMT[1]).

The bot is a Mono application written in C#, which runs in Wikimedia Cloud.

We're finding that as of last November the bots bots have started leaking
much
more memory than, possibly due to simply the wikis being more active
(especially Wikidata). This leads to frequent outages due to bots becoming
lagged, unresponsive for hours, etc.

I'm quite the noob when it comes to C#, and unfortunately there aren't
many maintainers left of this project (I inherited it to keep it running,
but
beyond that kinda out of my depths here). If you know C# well, or have
experience with VS2019, or just are looking for a challenge, check below
for more info and share ideas :)

https://github.com/countervandalism/CVNBot/issues/13

-- Timo

[1] https://meta.wikimedia.org/wiki/Countervandalism_Network/Bots
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] RFC: Discourage use of ENUM in db schemas

2020-06-03 Thread Krinkle
Link .

Various problems with ENUM have been presented,
and it appears that its use cases may be better accommodate
in db schemas for MediaWiki by other means.

I'm looking for the following:
* Success stories with ENUM (cases where it's not only good but better than
alternatives).
* Horror stories or scenarios we not yet covered.

These would influence the direction, our options are currently:
* Keep allowing it and encourage it for cases where its good.
* Start discouraging it in general, but document cases for which it makes
sense.
* Disallow by default (in policy). New usage would require approval through
some other method (e.g. DBA feedback or some such).

More at .

-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] (Low prio) Do we still need AtEase?

2020-05-24 Thread Krinkle
It does the same as the @ operator, except that it takes care to prevent a
very bad bug that existed before PHP 7. Details at
https://phabricator.wikimedia.org/T253461

If there are other issues or benefits, please write them on the task. The
overhead of AtEase is prerty minor, so really any benefit at all is likely
to tip the balance toward keeping it. But, in the event that there isn't
any, then perhaps we should slowly phase it out.

Best,
-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Production Excellence #20: April 2020

2020-05-14 Thread Krinkle
 Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/193/
---

How are we doing on that strive for operational excellence during these
unprecedented times?

##   Numbers for March and April

* 3 documented incidents. [1]
* 60 new Wikimedia-prod-error reports. [2]
* 58 Wikimedia-prod-error reports closed. [3]
* 178 currently open Wikimedia-prod-error reports in total. [4]

For more about recent incidents and pending actionables see Wikitech and
Phabricator, at https://wikitech.wikimedia.org/wiki/Incident_documentation
and https://phabricator.wikimedia.org/project/view/4758/

---

##   Outstanding reports

Breakdown of recent months:

* April 2019: Two reports closed, 2 of 14 left.
* May: (All clear!)
* June: 4 of 11 left (unchanged).
* July: 8 of 18 left (unchanged).
* August: 2 of 14 reports left (unchanged).
* September: 7 of 12 left (unchanged).
* October: Two reports closed, 4 of 12 left.
* November: One report closed, 4 of 5 left.
* December: Two reports closed, 4 of 9 left.
* January 2020: One report closed, 5 of 7 reports left.
* February: One report closed, 6 of 7 reports left.
* ❇️ March: 2 new reports survived the month of March.
* ❇️ April: 13 new reports survived the month of April.

Take a look at the workboard and look for tasks that could use your help.
→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

At the end of February the total of open reports over recent months was 58.
Of those, 12 got closed, but with 15 new reports from March/April still
open, the total is now up at 61 open reports.

The workboard overall (which includes pre-2019 tasks) has 178 tasks open.
This is actually down by a bit for the first time since October with
December at 196, January at 198, and February at 199, and now April at 178.
This was largely due to the Release Engineering and Core Platform teams
closing out forgotten reports that have since been resolved or otherwise
obsoleted.

---

##   Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---

Footnotes:
[1] Incidents. – https://wikitech.wikimedia.org/wiki/Incident_documentation
[2] Tasks created. –
https://phabricator.wikimedia.org/maniphest/query/HjopcKClxTfw/#R
[3] Tasks closed. –
https://phabricator.wikimedia.org/maniphest/query/ts62HKYPBxod/#R
[4] Open tasks. –
https://phabricator.wikimedia.org/maniphest/query/Fw3RdXt1Sdxp/#R
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Update on ResourceLoader debug mode

2020-04-02 Thread Krinkle
Hi,

I've written down some observations, thoughts and ideas for
ResourceLoader's debug mode.
See https://phabricator.wikimedia.org/T85805.
Do you like how it works today? Have suggestions for how it could be
better? Let me know here, or on-task :-)

I plan for it to be an iterative process. Mainly I'm looking to make sure
that we don't diverge the paths if we don't need to, and that if we do
split, that the new path is good enough for most use cases.

-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Production Excellence #19 (February 2020)

2020-03-24 Thread Krinkle
 Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/192/
---

How’d we do in our strive for operational excellence last month? Read on to
find out!

##   Month in numbers

* 8 documented incidents. [1]
* 27 new Wikimedia-prod-error reports. [2]
* 26 Wikimedia-prod-error reports closed. [3]
* 199 currently open Wikimedia-prod-error reports in total. [4]

With a median of 4–5 documented incidents per month (over the last three
years), there were a fairly large number of them this past month.

To read more about these incidents and pending actionables; check <
https://wikitech.wikimedia.org/wiki/Incident_documentation#2020>, or
Explore Wikimedia incident stats at <https://codepen.io/Krinkle/full/wbYMZK>
(interactive).

---

##   Unset vs array splice

Our error monitor (Logstash) received numerous reports about an “Undefined
offset” error from the OATHAuth extension. This extension powers the
Two-factor auth (2FA) login interface on Wikipedia (<
https://meta.wikimedia.org/wiki/Help:Two-factor_authentication>)

ItSpiderman and Reedy investigated the problem. The error message:

PHP Notice: Undefined offset: 8
at /srv/mediawiki/extensions/OATHAuth/src/Key/TOTPKey.php:188

This error means that the code was accessing item number 8 from a list (an
array), but the item does not exist. Normally, when a “2FA scratch token”
is used, we remove it from a list, and save the remaining list for next
time.

The code used the `count()` function to compute the length of the list, and
used a for-loop to iterate through the list. When the code found the user’s
token, it used the `unset( $list[$num] )` operation to remove token $num
from the list, and then save $list for next time.

The problem with removing a list item in this way is that it leaves a
“gap”. Imagine a list with 4 items, like [ 1: …, 2: …, 3: … , 4: … ]. If we
unset item 2, then the remaining list will be [ 1: …, 3: …, 4: … ]. The
next time we check this list, the length of the list is now 3 (so far so
good!), but the for-loop will access the items as 1-2-3. The code would not
know that 3 comes after 1, causing an error because item 2 does not exist.
And, the code would not even look at item 4!

When a user used their first ever scratch token, everything worked fine.
But from their second token onwards, the tokens could be rejected as
“wrong” because the code was not able to find them.

To avoid this bug, we changed the code to use `array_splice( $list, $num, 1
)` instead of `unset( $list[$num] )`. The important thing about
array_splice is that it renumbers the items in the list, leaving no gaps.

 – https://phabricator.wikimedia.org/T244308 /
https://gerrit.wikimedia.org/r/570253

---

##   Outstanding reports

Take a look at the workboard and look for tasks that might need your help.
The workboard lists error reports, grouped by the month in which they were
first observed.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Breakdown of recent months:

* March: 3 of 10 reports left (unchanged). ⚠️
* April: 4 of 14 left (unchanged).
* May: (All clear!)
* June: 4 of 11 left (unchanged).
* July: 8 of 18 left (unchanged).
* August: Two reports closed! 2 of 14 reports left.
* September: One report closed, 7 of 12 left.
* October: Two reports closed, 6 of 12 left.
* November: 5 of 5 left (unchanged).
* December: 6 of 9 left (unchanged).
* January: One report closed, 6 of 7 reports left.
* February: 7 new reports survived the month of February.

Last month’s total over recent months was 57 open reports. Of those, 6 got
closed, but with 7 new reports from February still open, the total is now
up at 58 open reports.

---

##   Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving
problems in Wikimedia production.

Together we’re getting there!

Until next time,

– Timo Tijhof




Footnotes:

[1] Incidents. –
https://wikitech.wikimedia.org/wiki/Incident_documentation#2020

[2] Tasks created. –
https://phabricator.wikimedia.org/maniphest/query/aT3iqdM0EJKW/#R

[3] Tasks closed. –
https://phabricator.wikimedia.org/maniphest/query/jVexIrtOPkcX/#R

[4] Open tasks. –
https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Production Excellence (January 2020)

2020-02-28 Thread Krinkle
 Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/180/
---

How’d we do in our strive for operational excellence last month? Read on to
find out!

##  Month in numbers

* 3 documented incidents. [1]
* 26 new Wikimedia-prod-error reports. [2]
* 26 Wikimedia-prod-error reports closed. [3]
* 198 currently open Wikimedia-prod-error reports in total. [4]

To read more about these incidents and pending actionables; check <
https://wikitech.wikimedia.org/wiki/Incident_documentation#2020>, or
Explore Wikimedia incident stats (interactive).

---

##  Paradoxical array key

Wikimedia encountered several Zend engine bugs that could corrupt a PHP
program at run-time, during the upgrade from HHVM to PHP 7.2. (Some of
these bugs are still being worked on.) One of the bugs we fixed last month
was particularly mysterious. Investigation led by Antoine (Hashar) and Tim
Starling.

MediaWiki would create an array in PHP and add a key-value pair to it. We
could iterate this array, and see that our key was there. Moments later, if
we tried to retrieve the value from that same array, sometimes the key
would no longer exist!

After many ad-hoc debug logs, core dumps, and GDB sessions, the problem was
tracked down to the string interning system of Zend PHP.  String interning
is a memory reduction technique. It means we only store one copy of a
character sequence in RAM, even if the many parts of the code use the same
character sequence. For example, the words “user” and “edit” are frequently
used in the MediaWiki codebase. One of those sequences is the empty string
(“”), which is also used a lot in our code. This is the string we found
disappearing most often from our PHP arrays. This bug affected several
components, including Wikibase, the wikimedia/rdbms library, and
ResourceLoader.

Tim used a hardware watchpoint in GDB, and traced the root cause to the
Memcached client for PHP. The php-memcached client would “free” a string
directly from the internal memory manager after doing some work. It did
this even for “interned” strings that other parts of the program may still
be depending on.

Effie and Giuseppe backported the upstream fix to our php-memcached package
and deployed it to production. Thanks! —
https://phabricator.wikimedia.org/T232613

---

##   Outstanding reports

Take a look at the workboard and look for tasks that might need your help.
The workboard lists error reports, grouped by the month in which they were
first observed.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Breakdown of recent months (past two weeks not included):

* March: 3 of 10 reports left. (unchanged). ⚠️
* April: Two reports closed, 4 of 14 left.
* May: (All clear!)
* June: Two reports closed. 4 of 11 left.
* July: Four reports closed, 8 of 18 left.
* August: 4 of 14 reports left. (unchanged)
* September: One report closed, 8 of 12 left.
* October: 8 of 12 left (unchanged).
* November: 5 of 5 left (unchanged)
* December: Three reports closed, 6 of 9 left.
* January: 7 new reports survived the month of January.

There are a total of 57 reports filed in recent months that remain open.
This is down from 62 last month.

---

##  Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving
problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---

Footnotes:

[1] Incidents. –
https://wikitech.wikimedia.org/wiki/Incident_documentation#2020

[2] Tasks created. –
https://phabricator.wikimedia.org/maniphest/query/qfCVpWqGX0tJ/#R

[3] Tasks closed. –
https://phabricator.wikimedia.org/maniphest/query/ndeCQjeJ6UNr/#R

[4] Open tasks. –
https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Remove "profileoutput" debug log channel from MediaWiki core

2020-02-27 Thread Krinkle
TLDR: If you use $wgDebugLogGroups['profileoutput'] in LocalSettings.php as
a way to collect performance profiles from Tideways/XHProf, this will no
longer receive messages in MediaWiki 1.35 and later.

---

Hi,

I've been auditing the Profiler component in MediaWiki and deprecating or
removing various legacy features that either don't fit the current model
very well and/or that seem to be broken, unused or undocumented. [1] The
objective here is to keep offering the same functionalities for developers,
but in a way that's easier to use, with fewer moving parts, and lower
maintenance costs for us.

Today I'm writing about the "profileoutput" debug log channel. As part of
the PSR-3/Monolog refactor many years ago, this feature was grandfathered
into the LegacyLogger.

The $wgProfiler variable controls which collector and output are used at
run-time. For example, you can collect the call graph with XHProf/Tideways,
and then output it to an HTML comment, or dump to a file on disk, etc. Any
number of outputs can be implemented and enabled.

The "profileoutput" debug log channel was effectively another way of
achieving the same thing, by assigning $wgDebugLogGroups['profileoutput']
to a file path.

If you're currently using this, see
https://phabricator.wikimedia.org/T245835 for how to make this work via
$wgProfiler instead.

-- Timo

[1] https://phabricator.wikimedia.org/T231366
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Production Excellence (December 2019)

2020-01-09 Thread Krinkle
Read on Phabricator at https://phabricator.wikimedia.org/phame/post/view/179
---

How’d we do in our strive for operational excellence in November and
December? Read on to find out!

##  Month in numbers

* 0 documented incidents in November, 5 incidents in December. [1]
* 17 new Wikimedia-prod-error reports. [2]
* 23 Wikimedia-prod-error reports closed. [3]
* 190 currently open Wikimedia-prod-error reports in total. [4]

November had zero reported incidents. Prior to this, the last month with no
documented incidents was December 2017. To read about past incidents and
unresolved actionables; check <
https://wikitech.wikimedia.org/wiki/Incident_documentation#2019>.

Explore Wikimedia incident graphs (interactive) at <
https://codepen.io/Krinkle/full/wbYMZK>.

---

## *️⃣ Many dots, do not a query make!

David Causse investigated a flood of exceptions from SpecialSearch […].

Read the story at <https://phabricator.wikimedia.org/phame/post/view/179>.

---

##   Outstanding reports

Take a look at the workboard and look for tasks that might need your help.
The workboard lists error reports, grouped by the month in which they were
first observed.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Breakdown of recent months (past two weeks not included):

* March: 3 of 10 reports left. (unchanged). ⚠️
* April: Three reports closed, 6 of 14 left.
* May: (All clear!)
* June: Three reports closed. 6 of 11 left (unchanged). ⚠️
* July: One report closed, 12 of 18 left.
* August: Two reports closed, 4 of 14 left.
* September: One report closed, with 9 of 12 left.
* October: Four reports closed, 8 of 12 left.
* November: 5 new reports survived the month of November.
* December: 9 new reports survived the month of December.

---

#  Thanks!

Thank you to everyone who helped by reporting, investigating, or resolving
problems in Wikimedia production.

Until next time,

– Timo Tijhof

---

Footnotes:

[1] Incidents. –
https://wikitech.wikimedia.org/wiki/Incident_documentation#2019

[2] Tasks created. –
https://phabricator.wikimedia.org/maniphest/query/AFDaPqjd5PTe/#R

[3] Tasks closed. –
https://phabricator.wikimedia.org/maniphest/query/YkIxmhRvEZ8R/#R

[4] Open tasks. –
https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] WikimediaDebug v2 is here! – Staging, perf profiling, verbose log

2019-12-16 Thread Krinkle
WikimediaDebug is a browser extension to help debug WMF wikis in
production. [1]

TL;DR:
* The popup has been redesigned (including dark mode support).
* Support for Beta Cluster (including XHGui).
* A new "Inline profile" option.

For more details and a general overview of how WikimediaDebug works:
See https://phabricator.wikimedia.org/phame/live/7/post/183/


-- Timo

[1]
Firefox:
https://addons.mozilla.org/en-US/firefox/addon/wikimedia-debug-header/
Chrome:
https://chrome.google.com/webstore/detail/wikimediadebug/binmakecefompkjggiklgjenddjoifbb
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] New diff themes in Gerrit Old UI (Colorblind-friendly!)

2019-12-11 Thread Krinkle
Hi,

I've implemented two new diff themes in Gerrit's old UI. They are meant to
address the following two issues:

1. The default diff theme used in Gerrit places dark red and purple text on
a dark red background (in removed code). This can be hard to read.
Especially on a yellow-tinted monitor (e.g. Red shift, Night shift, Flux,
etc.).

2. The general use of green/red while readable, is hard to distinguish for
certain types of color-blindness.

== Theme 1: "elegant"
(Screenshot at https://phabricator.wikimedia.org/T232893)

This theme adopts the MediaWiki diff styles (yellow and blue). Which means:
*  It only colours the background of the intra-line characters that were
added/changed/removed.
* The rest of the line uses a clean white background, thus providing full
colour contrast for all other syntax-highlighted code.
* The boundary of the added/removed blocks is indicated by a border (not by
absence of a background shade).
* The unchanged code lines are lightly shaded in grey (instead of the
default where these lesser important lines are given a bright white
background).

In other words, just like MediaWiki :)

== Theme 2: "eclipse"
(Screenshot at https://phabricator.wikimedia.org/T232893)

* The background greens and reds are lighter overall, stronger colour
contrast. The background reds have moved closer to orange, the greens have
moved closer toward cyan/blue.

Overall this theme is still very close to the default Gerrit theme, and
might be a good option if you've found the readability a concern but would
rather not make as much of a radical change (yet :D).

== How to change my diff theme?

From any diff in Gerrit:
1. Click the gear icon in the top-right corner.
2. Select a Theme. (This provides a live preview!)
3. Press "Save" if you like it.

Alternatively, you can change it (without live preview) from the Settings
page:
https://gerrit.wikimedia.org/r/#/settings/diff-preferences

== What about the new UI?

The new UI already has better colour contrast in its diff styles, with
lighter background reds and greens by default.

Unfortunately the new UI has removed both the ability to theme the syntax
highlighting, as well as the ability to theme the diff colours on top of
that. Neither is changeable in the new UI currently.

For the time being I'm a hold out in the old UI as I'm unable to use review
dashboards in the new UI (until we upgrade to 2.16). Meanwhile this is
beneficial to myself and anyone else in a similar hold out.

-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Production Excellence (October 2019)

2019-11-07 Thread Krinkle
 Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/178/
---

How’d we do in our strive for operational excellence last month? Read on to
find out!

##  Month in numbers

* 3 documented incidents. [1]
* 33 new Wikimedia-prod-error reports. [2]
* 30 Wikimedia-prod-error reports closed. [3]
* 207 currently open Wikimedia-prod-error reports in total. [4]

There were three recorded incidents last month, which is slightly below our
median of the past two years. Explore this data: <
https://codepen.io/Krinkle/full/wbYMZK>. To read more about these
incidents, their investigations, and pending actionables; check
https://wikitech.wikimedia.org/wiki/Incident_documentation#2019

---

## *️⃣ To Log or not To Log

MediaWiki uses the PSR-3 compliant Monolog library to send messages to
Logstash (via rsyslog and Kafka [5]). These messages are used to
automatically detect (by quantity) when the production cluster is in an
unstable state. For example, due to an increase in application errors when
deploying code, or if a backend system is failing. Two distinct issues
hampered the storing of these messages this month, and both affected us
simultaneously.

*Elasticsearch mapping limit*
The Elasticsearch storage behind Logstash optimises responses to Logstash
queries with an index. This index has an upper limit to how many distinct
fields (or columns) it can have. When reached, messages with fields not yet
in the index are discarded. Our Logstash indexes are sharded by date and
source (one for “mediawiki”, one for “syslog”, and one for everthing else).

This meant that error messages were only stored if they only contained
fields used before, by other errors stored that day. Which in turn would
only succeed if that day’s columns weren’t already fully taken. A seemingly
random subset of error messages was then rejected for a full day. Each day
it got a new chance at reserving its columns, so long as the specific kind
of error is triggered early enough.

To unblock deployment automation and monitoring of MediaWiki, an interim
solution was devised. The subset of messages from “mediawiki” that deal
with application errors now have their own index shard. These error reports
follow a consistent structure, and contain no free-form context fields. As
such, this index (hopefully) can’t reach its mapping limit or suffer
message loss.

The general index mapping limit was also raised from 1000 to 2000. For now
that means we’re not dropping any non-critical/debug messages. More
information about the incident at https://phabricator.wikimedia.org/T234564.
The general issue with accommodating debug messages in Logstash long-term,
is tracked at https://phabricator.wikimedia.org/T180051. Thanks Bartosz,
and Keith Herron.

*Crash handling*
Wikimedia’s PHP configuration has a “crash handler” that kicks in if
everything else fails. For example, when the memory limit or execution
timeout is reached, or if some crucial part of MediaWiki fails very early
on. In that case our crash handler renders a Wikimedia-branded system error
page (separate from MediaWiki and its skins). It also increments a counter
metric for monitoring purposes, and sends a detailed report to Logstash. In
migrating the crash handler from HHVM to PHP7, one part of the puzzle was
forgotten. Namely the Logstash configuration that forwards these reports
from php-fpm’s syslog channel to the one for mediawiki.

As such, our deployment automation and several Logstash dashboards were
blind to a subset of potential fatal errors for a few days. Regressions
during that week were instead found by manually digging through the raw
feed of the php-fpm channel instead. As a temporary measure, Scap was
updated to consider the php-fpm’s channel as well in its automation that
decides whether a deployment is “green”.

We’ve created new Logstash configurations that forward PHP7 crashes in a
similar way as we did for HHVM in the past. Bookmarked MW
dashboards/queries you have for Logstash now provide a complete picture
once again. Thanks Effie, and Cole White! –
https://phabricator.wikimedia.org/T234283

---

##   Outstanding reports

Take a look at the workboard and look for tasks that might need your help.
The workboard lists error reports, grouped by the month in which they were
first observed.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Or help someone that’s already started with their patch:
→  https://phabricator.wikimedia.org/maniphest/query/CFzrDj3vFbE_/#R

Breakdown of recent months (past two weeks not included):

* March: 1 report fixed. (3 of 10 reports left).
* April: 8 of 14 reports left (unchanged). ⚠️
* May: (All clear!)
* June: 9 of 11 reports left (unchanged). ⚠️
* July: 13 of 18 reports left (unchanged).
* August: 2 reports were fixed! (6 of 14 reports left).
* September: 2 reports were fixed! (10 of 12 new reports left).
* October: 12 new reports survived the month of October.

---

#  Thanks!

Tha

[Wikitech-l] Fresnel 0.5.0 for MediaWiki (perf testing in CI)

2019-10-24 Thread Krinkle
This week Fresnel 0.5.0 was released and deployed to Jenkins.

Highlighted changes:

   - Add support for Mann–Whitney U test. [1] –
   https://phabricator.wikimedia.org/T223977,
   https://en.wikipedia.org/wiki/Mann-Whitney_U_test
   - Switch regression detection from diffStdev to diffMannWhitney (for
   Paint Timing metrics).
   - Update Chromium from 73.0 to 77.0.
   - Enable Gzip for static files in web server behind Fresnel and Quibble.
   – https://gerrit.wikimedia.org/r/539427


I've written a visual guide for how to open Fresnel's performance report in
your Chrome DevTools locally:

 https://wikitech.wikimedia.org/wiki/Performance/Fresnel#DevTools

-- Timo Tijhof

Change log and tracking task for Fresnel 0.5.0:
https://github.com/wikimedia/fresnel/blob/0.5.0/CHANGELOG.md
https://phabricator.wikimedia.org/T235195

-- Forwarded message -
From: Timo Tijhof 
Subject: Fresnel for MediaWiki (performance testing)
Date: Thu, Mar 7, 2019 at 1:10 AM

You may have noticed something called "mediawiki-fresnel" leaving messages
on Gerrit patches for MediaWiki in the past few days, and wondering what
it's all about. Allow me to introduce Fresnel!

Fresnel is an automation tool for measuring and comparing client-side
performance from web pages. Fresnel was developed over the past two
quarters and is now ready for action. [1] [2] [3]

To learn more about how to use it, what it offers, and how it works, check
out:
https://wikitech.wikimedia.org/wiki/Performance/Fresnel

Some feature highlights:


   - ⏱  *Metrics* from Navigation Timing, Paint Timing, and Resource Timing
   APIs.
   -   DevTools *Timeline* from CI recording can be viewed locally in
   Chrome.
   -   Recordings take a *screenshot* available in build artefacts.
   -   Scenarios perform a *warmup* and multiple runs for more stable
   metrics.


--
Timo Tijhof

[1] Launch task: https://phabricator.wikimedia.org/T133646
[2] Phabricator project: https://phabricator.wikimedia.org/tag/fresnel/
[3] Task list:
https://phabricator.wikimedia.org/maniphest/query/9w6EAEPPLQ72/#R
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Wikimedia production excellence (September 2019)

2019-10-24 Thread Krinkle
 Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/173/
---

How’d we do in our strive for operational excellence last month? Read on to
find out!

##  Month in numbers

* 5 documented incidents. [1]
* 22 new errors reported. [2]
* 31 error reports closed. [3]
* 213 currently open Wikimedia-prod-error reports in total. [4]

There were five recorded incidents last month, equal to the median for this
and last year. – Explore this data at https://codepen.io/Krinkle/full/wbYMZK

To read more about these incidents, their investigations, and pending
actionables; check
https://wikitech.wikimedia.org/wiki/Incident_documentation#2019

## *️⃣ A Tale of Three Great Upgrades

This month saw three major upgrades across the MediaWiki stack.

*Migrate from HHVM to PHP 7.2*
The client-side switch to toggle between HHVM and PHP 7.2 saw its final
push — from the 50% it was at previously, to 100% of page view sessions on
17 September. The switch further solidified on 24 September when static
MediaWiki traffic followed suit (e.g. API and ResourceLoader). Thanks Effie
and Giuseppe for the final push. – More details at
https://phabricator.wikimedia.org/T219150 and
https://phabricator.wikimedia.org/T176370.

*Drop support for IE6 and IE7*
The RFC to discontinue basic compatibility for the IE6 and IE7 browsers
entered Last Call on 18 September. It was approved on 2 Oct (T232563).
Thanks to Volker Eckl for leading the sprint to optimise our CSS payloads
by removing now-redundant style rules for IE6-7 compat. – More at
https://phabricator.wikimedia.org/T234582.

*Transition from PHPUnit 4/6 to PHPUnit 8*
With HHVM behind us, our Composer configuration no longer needs to be
compatible with a “PHP 5.6 like” run-time. Support for the real PHP 5.6 was
dropped over 2 years ago, and the HHVM engine supports PHP 7 features. But,
the HHVM engine identifies as “PHP 5.6.999-hhvm”. As such, Composer refused
to install PHPUnit 6 (which requires PHP 7.0+). Instead, Composer could
only install PHPUnit 4 under HHVM (as for PHP 5.6). Our unit tests have had
to remain compatible with both PHPUnit 4 and PHPUnit 6 simultaneously.

Now that we’re fully on PHP 7.2+, our Composer configuration effectively
drops PHP 5.6, 7.0 and 7.1 all at once. This means that we no longer run
PHPUnit tests on multiple PHPUnit versions (PHPUnit 6 only). The upgrade to
PHPUnit 8 (PHP 7.2+) is also unlocked! Thanks Max Sem, Jdforrester and
Daimona for leading this transition. –
https://phabricator.wikimedia.org/T192167

---

##   Outstanding reports

Take a look at the workboard and look for tasks that might need your help.
The workboard lists error reports, grouped by the month in which they were
first observed.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Or help someone that’s already started with their patch:
→  https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R

Breakdown of recent months (past two weeks not included):

* February: 1 report was closed. (1 / 5 reports left).
* March: 4 / 10 reports left (unchanged).
* April: 8 / 14 reports left (unchanged). ⚠️
* May: The last 4 reports were resolved. Done!
* June: 9 of 11 reports left (unchanged). ⚠️
* July: 4 reports were fixed! (13 / 18 reports left).
* August: 6 reports were fixed! (8 / 4 reports left).
* September: 12 new reports survived the month of September.

##  Thanks!

Thank you, to everyone else who helped by reporting, investigating, or
resolving problems in Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---

Footnotes:

[1] Incidents. –
https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201909=0=1=1

[2] Tasks created. –
https://phabricator.wikimedia.org/maniphest/query/XicVcsN1XkVH/#R

[3] Tasks closed. –
https://phabricator.wikimedia.org/maniphest/query/SXjsllmYHwAO/#R

[4] Open tasks. –
https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Wikimedia production excellence (August 2019)

2019-10-02 Thread Krinkle
 Read on Phabricator at
https://phabricator.wikimedia.org/phame/post/view/172
---

How’d we do in our strive for operational excellence in August? Read on to
find out!

##   Month in numbers

* 3 documented incidents. [1]
* 42 new Wikimedia-prod-error reports. [2]
* 31 Wikimedia-prod-error reports closed. [3]
* 210 currently open Wikimedia-prod-error reports in total. [4]

The number of recorded incidents in August, at three, was below average for
the year so far. However, in previous years (2017-2018), August also has
2-3 incidents. – Explore the data at https://codepen.io/Krinkle/full/wbYMZK

To read more about these incidents, their investigations, and pending
actionables; check
https://wikitech.wikimedia.org/wiki/Incident_documentation#2019

##  *️⃣ When you have eliminated the impossible...

Reports from Logstash indicated that some user requests were aborted by a
fatal PHP error from the MessageCache class. The user would be shown a
generic system error page. The affected requests didn’t seem to have
anything obvious in common, however. This made it difficult to diagnose.

MessageCache is responsible for fetching interface messages, such as the
localised word “Edit” on the edit button. It calls a “load()” function and
then tries to access the loaded information. However, sometimes the load
function would claimed to have finished its work, but yet the information
was not there.

When the load function initialises all the messages for a particular
language, it keeps track of this, so as to not do the same a second time.
From any one angle I could look at this code, no obvious mistakes stood
out. A deeper investigation revealed that two unrelated changes (more than
a year apart), each broke 1 assumption that was safe to break. But, put
together, and this seemingly impossible problem emerges. Check out the
details of the investigation at
https://phabricator.wikimedia.org/T208897#5373846.

##    Outstanding reports

Take a look at the workboard and look for tasks that might need your help.
The workboard lists error reports, grouped by the month in which they were
first observed.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

Or help someone that’s already started with their patch:
→  https://phabricator.wikimedia.org/maniphest/query/pzVPXPeMfRIz/#R

Breakdown of recent months (past two weeks not included):

* January: 1 report left (unchanged).
* February: 2 reports left (unchanged). ⚠️
* March: 4 reports left (unchanged). ⚠️
* April: 2 reports got fixed! (8 of 14 reports left).
* May: 4 of 10 reports left (unchanged). ⚠️
* June: 1 report got fixed! (8 of 11 reports left).
* July: 2 reports got fixed (17 of 18 reports left).
* August: 14 new reports remain unsolved.
* September: 11 new reports remain unsolved.

---

##   Thanks!

Thank you to Aaron Schulz, Daimona, David Barratt, James Forrester, Kosta
Harlan, Piotr Miazga, Roan Kattouw, Tom Arrow, Željko Filipin, and everyone
else who helped by reporting, investigating, or resolving problems in
Wikimedia production. Thanks!

Until next time,

– Timo Tijhof

---


Footnotes:

[1] Incidents. –
https://wikitech.wikimedia.org/wiki/Special:PrefixIndex?prefix=Incident+documentation%2F201908=0=1=1

[2] Tasks created. –
https://phabricator.wikimedia.org/maniphest/query/8fpsoBLrmlFu/#R

[3] Tasks closed. –
https://phabricator.wikimedia.org/maniphest/query/U9.KRVNW52Yb/#R

[4] Open tasks. –
https://phabricator.wikimedia.org/maniphest/query/47MGY8BUDvRD/#R
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] PLURAL in mw.msg

2019-10-01 Thread Krinkle
On Wed, 2 Oct 2019 at 03:17, Jeroen De Dauw  wrote:

>
> Including mediawiki.jqueryMsg fixed the issue. Thanks a bunch!
>
> I concur this is rather peculiar design. It'd be helpful to have some
> indication of this in the method docs of the mw.msg() code.


It is documented in the block for message.text() and message.parse(), with
mw.msg() being a shortcut for message.text(). Could be improved to mention
there directly as well!

-- Krinkle

https://doc.wikimedia.org/mediawiki-core/master/js/#!/api/mw.Message-method-text

https://doc.wikimedia.org/mediawiki-core/master/js/#!/api/mw.Message-method-parse

https://doc.wikimedia.org/mediawiki-core/master/js/#!/api/mw-method-msg
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fresh-node: 19.10.1

2019-10-01 Thread Krinkle
I've published a new version of Fresh. Fresh is a simple way to create
light and fast isolated contexts in your Terminal. For example, when you
need to run 'npm' commands that install and run code needed for ESLint,
Grunt or Selenium tests.

Get started at https://github.com/wikimedia/fresh

See also:
*
https://www.mediawiki.org/wiki/Manual:JavaScript_unit_testing#Getting_started
*
https://www.mediawiki.org/wiki/Selenium/Node.js/Target_Local_MediaWiki_(Container)

Background:
Last month I wrote [1] about the risk and dangers involved with running
"npm install" and "npm test" commands as developers. In a nut shell: There
are no built-in protections. At risk are your personal data, web browser
session, and more. Interactions with 'git', 'sudo' or 'ssh' are also easy
to spy on or influence. This all in addition to the "normal" risk of
packages having undiscovered malicious (or non-malicious) security problems
in indirect dependencies that have never been audited for security by
anyone you'd know or trust. In particular, I think it is important to
understand that npm is different from Debian or PyPi in terms of social
etiquette and curation. More about that at [1].

-- Timo

[1]
https://medium.com/@timotijhof/how-to-protect-yourself-from-vulnerable-npm-packages-c03f85249651
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Declaring methods final in classes

2019-09-01 Thread Krinkle
On Sun, Sep 1, 2019 at 12:40 PM Aryeh Gregor  wrote:

> On Fri, Aug 30, 2019 at 10:09 PM Krinkle  wrote:
> > For anything else, it doesn't really work in my experience because
> PHPUnit
> > won't actually provide a valid implementation of the interface. It
> returns
> > null for anything, which is usually not a valid implementation of the
> > contract the class under test depends on. [..]
>
> In my experience, classes that use a particular interface usually call
> only one or two methods from it, which can usually be replaced by
> returning fixed values or at most two or three lines of code. [..]

It could sometimes not be worth it to mock for the reason you give,
> but in my experience that's very much the exception rather than the rule.
> [..]


> Consider a class like ReadOnlyMode. It has an ILoadBalancer injected.
> The only thing it uses it for is getReadOnlyReason(). When testing
> ReadOnlyMode, we want to test "What happens if the load balancer
> returns true? What about false?" A mock allows us to inject the
> required info with a few lines of code.


While I do prefer explicit dependencies, an extreme of anything is rarely
good. For this case, I'd agree and follow the same approach.

And I think there are plenty of other examples where I'd probably go for a
partial mock. Doing so feels like a compromise to me, as I have often seen
this associated with working around technical debt (e.g. the relationship
between those classes could've been better perhaps). But I think it's fair
to say there will be cases where the relationship is fine and it still
makes sense to write a partial mock to keep the test simple.

In addition to that, I do think it is important to also have at least one
test case at the integration level to verify that the higher-level purpose
and use case of the feature works as intended - where you'd have to
construct the full dependency tree and mock or stub only the signal that is
meant to travel through the layers. Thus ruling out any mistake in the
(separate) unit tests for ReadOnlyMode and LoadBalancer. But, you wouldn't
need to have full coverage through that - just test the communication
between the two. The unit tests would then aim for coverage of all the edge
cases and variations.

Thanks for writing this up.


> 5) In some cases you want to know that a certain method is being
> called with certain parameters,[ ..] Maybe the bug changed the
> WHERE clause to one that happens to select your row of test data
> despite being wrong.
>

Yep, this is important. I tend to go for lower level observation instead of
injected assertions. E.g. mock IDatabase::select/insert etc. to stash each
query in an array, and then toward the end assert that the desired queries
have occurred exactly as intended and nothing more. Similar to how one
can memory-buffer a Logger instance.

I like the higher confidence this gives and I find it easier to write than
a PHPUnit
mock where each function call is precisely counted and params validated,
while being more tolerant to internal details changing over time.

Having said that, they both achieve the same goal. I suppose its a matter
of taste. Certainly nothing wrong with the PHPUnit approach, I'd just not
do it myself as much. Thanks for reminding me of this.


> 6) [..]
> One way to help catch inadvertent performance regressions is to test
> that the means of ensuring performance are being used properly. For
> instance, if a method is supposed to first check a cache and only fall
> back to the database for a cache miss, we want to test that the
> database query is actually only issued in the event of a cache miss.


Yep, that works. I tend to do this differently, but it works and its
important
that we make these assertions somewhere. I don't disagree :)

-- Timo
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

  1   2   3   4   5   >