Wow! That is amazing news — great work all around!
On Mon, Aug 22, 2022 at 8:57 AM Amir Sarabadani <[email protected]> wrote: > Hello, > > After many years of work, I'm happy to announce a milestone in addressing > one of our major areas of tech debt in database infrastructure: we have > eliminated all schema drifts between MediaWiki core and production. > > It all started six years ago when users in English Wikipedia reported that > checking history of some pages is quite slow *at random*. More in-depth > analysis showed the revision table in English Wikipedia was missing an > important index in some of the replicas. An audit of the schema of the > revision table revealed much bigger drifts in the revision table of that > Wiki. You can read more in its ticket: T132416 > <https://phabricator.wikimedia.org/T132416> > > Lack of schema parity between expectation and reality is quite dangerous. > Trying to force an index in code assuming it would exist in production > (under the same name) would cause fatal error every time it’s attempted. > Trying to write to a field that doesn’t exist is similar. Such changes > easily pass tests and work well in our test setups (such as beta cluster) > just to cause an outage in production. > > If only one table in one Wiki had this many drifts, looking at all Wikis > and all tables became of vital importance. We have around ~1,000 wikis, > ~200 hosts (each one hosting on average ~100 Wikis), and each Wiki has > around ~130 tables (half of them being tables from MediaWiki core) and each > table can have multiple drifts. > > We slowly started looking for and addressing schema drifts five years ago > and later automated the discovery by utilizing abstract schema (before > that, the tool had to parse SQL) and discovered an overwhelming number of > drifts. You can look at the history of the work in T104459 > <https://phabricator.wikimedia.org/T104459>. > > Around fifty tickets addressing the drifts have been completed and they > are collected in T312538 <https://phabricator.wikimedia.org/T312538>. I > suggest checking some of them to see the scale of the work done. Each one > of these tickets took days to months of work to finish. Large number of > them also existed in primary databases, requiring a primary switchover and > read-only time for one or more Wikis. Each drift was different, in some > cases, you needed to change the code and not production so it needed a > thorough investigation. > > Why do such drifts happen? The most common reason was when a schema change > happened in code but it was never requested to be applied in production. > For example, a schema change in code in 2007 led to having any wiki created > before that date to have a different schema than wikis created after it. We > introduced processes > <https://wikitech.wikimedia.org/wiki/Schema_changes#Workflow_of_a_schema_change> > and tooling to make sure this doesn’t happen anymore in 2015 but we still > needed to address previous drifts. The second common reason was when a host > didn’t get the schema change for various reasons (was out of rotation when > the schema was being applied, a shortcoming of the manual process). By > automating <https://wikitech.wikimedia.org/wiki/Auto_schema> most of the > schema change operational work we reduced the chance of such drifts from > happening as well. > > After finishing core, we now need to look at WMF-deployed extensions, > starting with FlaggedRevs <https://phabricator.wikimedia.org/T313253> > that, while being deployed to only 50 wikis and having only 8 tables, has > ~7,000 drifts. Thankfully, most other extensions are in a healthier state. > > I would like to personally thank Manuel Arostegui and Jaime Crespo for > their monumental dedication to fix these issues in the past years. Also a > big thank you to several of our amazing developers, Umherirrender, James > Forrester and Sam Reed who helped on reporting, going through the history > of MediaWiki to figure out why these drifts happened, and helping build the > reporting tools. > > Best > -- > *Amir Sarabadani (he/him)* > Staff Database Architect > Wikimedia Foundation <https://wikimediafoundation.org/> > _______________________________________________ > Wikitech-l mailing list -- [email protected] > To unsubscribe send an email to [email protected] > https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ -- * Sammy Tarling *(she/her) Software Engineer Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________ Wikitech-l mailing list -- [email protected] To unsubscribe send an email to [email protected] https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
