Wow! That is amazing news — great work all around!

On Mon, Aug 22, 2022 at 8:57 AM Amir Sarabadani <[email protected]>
wrote:

> Hello,
>
> After many years of work, I'm happy to announce a milestone in addressing
> one of our major areas of tech debt in database infrastructure: we have
> eliminated all schema drifts between MediaWiki core and production.
>
> It all started six years ago when users in English Wikipedia reported that
> checking history of some pages is quite slow *at random*. More in-depth
> analysis showed the revision table in English Wikipedia was missing an
> important index in some of the replicas. An audit of the schema of the
> revision table revealed much bigger drifts in the revision table of that
> Wiki. You can read more in its ticket: T132416
> <https://phabricator.wikimedia.org/T132416>
>
> Lack of schema parity between expectation and reality is quite dangerous.
> Trying to force an index in code assuming it would exist in production
> (under the same name) would cause fatal error every time it’s attempted.
> Trying to write to a field that doesn’t exist is similar. Such changes
> easily pass tests and work well in our test setups (such as beta cluster)
> just to cause an outage in production.
>
> If only one table in one Wiki had this many drifts, looking at all Wikis
> and all tables became of vital importance. We have around ~1,000 wikis,
> ~200 hosts (each one hosting on average ~100 Wikis), and each Wiki has
> around ~130 tables (half of them being tables from MediaWiki core) and each
> table can have multiple drifts.
>
> We slowly started looking for and addressing schema drifts five years ago
> and later automated the discovery by utilizing abstract schema (before
> that, the tool had to parse SQL) and discovered an overwhelming number of
> drifts. You can look at the history of the work in T104459
> <https://phabricator.wikimedia.org/T104459>.
>
> Around fifty tickets addressing the drifts have been completed and they
> are collected in T312538 <https://phabricator.wikimedia.org/T312538>. I
> suggest checking some of them to see the scale of the work done. Each one
> of these tickets took days to months of work to finish. Large number of
> them also existed in primary databases, requiring a primary switchover and
> read-only time for one or more Wikis. Each drift was different, in some
> cases, you needed to change the code and not production so it needed a
> thorough investigation.
>
> Why do such drifts happen? The most common reason was when a schema change
> happened in code but it was never requested to be applied in production.
> For example, a schema change in code in 2007 led to having any wiki created
> before that date to have a different schema than wikis created after it. We
> introduced processes
> <https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change>
> and tooling to make sure this doesn’t happen anymore in 2015 but we still
> needed to address previous drifts. The second common reason was when a host
> didn’t get the schema change for various reasons (was out of rotation when
> the schema was being applied, a shortcoming of the manual process). By
> automating <https://wikitech.wikimedia.org/wiki/Auto_schema> most of the
> schema change operational work we reduced the chance of such drifts from
> happening as well.
>
> After finishing core, we now need to look at WMF-deployed extensions,
> starting with FlaggedRevs <https://phabricator.wikimedia.org/T313253>
> that, while being deployed to only 50 wikis and having only 8 tables, has
> ~7,000 drifts. Thankfully, most other extensions are in a healthier state.
>
> I would like to personally thank Manuel Arostegui and Jaime Crespo for
> their monumental dedication to fix these issues in the past years. Also a
> big thank you to several of our amazing developers, Umherirrender, James
> Forrester and Sam Reed who helped on reporting, going through the history
> of MediaWiki to figure out why these drifts happened, and helping build the
> reporting tools.
>
> Best
> --
> *Amir Sarabadani (he/him)*
> Staff Database Architect
> Wikimedia Foundation <https://wikimediafoundation.org/>
> _______________________________________________
> Wikitech-l mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



-- 
* Sammy Tarling *(she/her)

Software Engineer

Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________
Wikitech-l mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to