On Tue, Dec 11, 2018, 9:18 PM Krinkle <krinklem...@gmail.com> wrote: > 📘 Read this post on Phabricator at > https://phabricator.wikimedia.org/phame/live/1/post/129/ > ------- > > How’d we do in our strive for operational excellence last month? Read on to > find out! > > - Month in numbers. > - Current problems. > - Highlighted stories. > > ## 📊 *Month in numbers* > > * 4 documented incidents in November 2018. [1] > * 42 Wikimedia-prod-error tasks closed in November 2018. [2] > * 36 Wikimedia-prod-error tasks created in November 2018. [3] > * 165 currently open Wikimedia-prod-error tasks (as of 12 December 2018). > > Terminology: > * An *Exception* (or fatal) causes user actions to be prevented. For > example, a page would display “Exception: Unable to render page”, instead > the article content. > * An *Error* (or non-fatal, or warning) can produce page views that are > technically unaware of a problem, but may show corrupt, incorrect, or > incomplete information. Examples – an article would display the code word > “null” instead of the actual content, a user looking for Vegetables may be > taken to an article about Vegetarians, a user may receive a notification > that says “*You have (null) new messages.*” > > With that behind us... Let’s celebrate this month’s highlights! > > ## *️⃣ *DB exception at wikitech.wikimedia.org > <http://wikitech.wikimedia.org>* > > Quiddity reported that he was unable to disable a spam account, due to a > fatal exception. Andre Klapper used the Exception ID to find the stack > trace in the logs. The trace revealed that a table was missing in > Wikitech’s database. > > The MediaWiki software was recently expanded with a “Partial blocking” > ability. [4] This involved introducing a new database table that stores > block metadata differently. This software update was deployed to Wikitech, > but this new table was not created. > > @Marostegui (Database administrator) quickly applied the schema patches > that create the missing table. Thanks Manuel, Andre, and Quiddity; > Teamwork! > > – https://phabricator.wikimedia.org/T209674 > > ## *️⃣ *Big-page Deletion Unleashed!* > > It had been known for years, [5] that users are unable to delete or restore > pages with more than a few hundred revisions. Attempts to do so could fail, > with a fatal “DBTransactionSizeError” exception. This error indicates that > the change is too big or too slow. Such changes risk replication lag, and > may impact the stability of the infrastructure. > > The database structure used by MediaWiki for page archives dates back to > 2003 (over 15 years ago). I'll spare you the details, but it depends on > database interactions that are inherently slow when applied to systems as > big as Wikipedia! RFC T20493 intends to modernise this structure for the > long-term. > > Then along came @BPirkle. Bill joined the WMF Core platform team earlier > this year. He took on the challenge of making page deletion work for any > size page, today. > > Previously, page deletion happened in a single step. This simple approach > had the benefit of either succeeding in its entirety, or safely rolling > back like nothing happened. It also meant that the database protected us > against conflicting changes. In August, Bill started a two-month effort > that carefully split the logic for “delete a page” into smaller steps that > each are safe and quick. It now uses our JobQueue to schedule and run these > steps, without the user waiting for it. > > – https://phabricator.wikimedia.org/T198176 / > https://gerrit.wikimedia.org/r/456035 > > ## 📉 *Current problems* > > Take a look at the workboard and look for tasks that might need your help. > The workboard lists known issues, grouped by the week in which they were > first observed. > > → https://phabricator.wikimedia.org/tag/wikimedia-production-error/ > > I’d like to draw attention to a subset of PHP fatal errors. Specifically, > those that are publicly exposed (e.g. don’t require elevated user rights) > and use an HTTP 500 status code. > > * CentralNotice: Some Special:CentralNoticeBanners urls fatal. – > https://phabricator.wikimedia.org/T149240 > * Flow: Unable to view certain talk pages due to workflow > InvalidDataException. – https://phabricator.wikimedia.org/T70526 > * JsonConfig: Unable to diff certain “.map” pages on Commons. – > https://phabricator.wikimedia.org/T203063 > * MediaWiki (Parser): Parse API exposes fatal content model error. – > https://phabricator.wikimedia.org/T206253 > * MediaWiki (Special-pages): Special:DoubleRedirects unavailable on ttwiki. > – https://phabricator.wikimedia.org/T204800 > * MobileFrontend: Some Special:MobileDiff urls fatal. – > https://phabricator.wikimedia.org/T156293
Note this is not a mobilefrontend issue but an issue with the MassMessage extension - it impacts desktop too See https://www.mediawiki.org/w/index.php?title=User%3AQuiddity%2Fdemomodel&type=revision&diff=2234116&oldid=2234115 > > * ProofreadPage: Unable to edit certain pages on Wikisource. – > https://phabricator.wikimedia.org/T176196 > * Translate: Some Special:Translate urls fatal. – > https://phabricator.wikimedia.org/T204833 > * Wikibase: Clicking “undo” for some revisions fatals with a > PatcherException. – https://phabricator.wikimedia.org/T97146 > > Public user requests resulting in fatals can (and have) caused alerts to > fire that notify SRE of wikis potentially being less available or down. > > 💡*ProTip*: Cross-reference one workboard with another via “Open Tasks” > > “Advanced Filter” and enter Tag(s) to apply as a filter. > > ## 🎉 *Thank you* > > Thank you to everyone who helped by reporting or investigating problems in > Wikimedia production; and for implementing or reviewing their solutions. > Including: tstarling, thiemowmde, thcipriani, Tgr, Steinsplitter, Quiddity, > pmiazga, Nikerabbit, Mvolz, Lucas_Werkmeister_WMDE, kostajh, jrbs, JJMC89, > Jdforrester-WMF, hashar, Gilles, Daimona, Ciencia_Al_Poder, Catrope, > BPirkle, Barkeep49, Anomie, and Aklapper. > > Thanks! > > Until next time, > > – Timo Tijhof > > ------- > > Footnotes: > > [1] Incidents. – > > https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20181101&to=Incident+documentation%2F20181131&namespace=0 > > [2] Tasks closed. – > https://phabricator.wikimedia.org/maniphest/query/.PkyGL4Rz_4i/#R > > [3] Tasks opened. – > https://phabricator.wikimedia.org/maniphest/query/WsqbAxlHPLwk/#R > > [4] Partial blocks. – > > https://meta.wikimedia.org/wiki/Community_health_initiative/Per-user_page,_namespace,_and_upload_blocking > > [5] Bug report about page deletion, 2007. – > https://phabricator.wikimedia.org/T13402 > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Jon Robson twitter: @jdlrobson linkedin: https://www.linkedin.com/in/jorobson/ _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l