Thanks for the summary. :) Pine ( https://meta.wikimedia.org/wiki/User:Pine )
On Fri, Aug 10, 2018 at 1:43 AM Krinkle <[email protected]> wrote: > How did we do in our strive for operational excellence since last month? > Read on to find out! > > ## The month in numbers > > * 2 documented incidents since July 19. [1] > * 55 Wikimedia-log-errors tasks closed after July 19. [2] > * 31 Wikimedia-log-errors tasks created after July 19. [3] > > Logstash (type=mediawiki, last 7 days): > * 2,048 fatals. (channel=fatal) > * 117,372 exceptions. (channel=exception) > * 21,043 PHP errors. (channel=error) > * 6,368,647 total error-level events. (channel=*, level=ERROR) > > ## Highlights > > ### New database partition > > @Josve05a reported that Special:Log was timing out on > commons.wikimedia.org > for certain queries. Database administrator @Marostegui, investigated the > underlying query and found out this was caused by one of the backend > database servers having an unpartitioned 'logging' table. Manuel took the > server out of rotation for re-partitioning, which was completed later that > day. > > – https://phabricator.wikimedia.org/T199790 > > ### Disappearing audio players, mystery solved > > When Étienne Beaulé (@Ebe123) found PHP-Notice errors in the Score > extension, they immediately investigated. It began as the fixing of a typo > that caused > inefficient (but working) parsing of audio data. Upon closer inspection, a > bigger story was uncovered. The computation of audio lengths was being > skipped due to a mismatch in MIME-types between Score and > TimedMediaHandler. The player needs this length, and as a result, browsers > had to download and parse the audio data entirely client-side, creating a > delay of 5-20 seconds or more. > > Four months earlier, Andre reported that pressing play on an audio player, > made the player disappear for a long time. > It all makes sense now. > > – https://phabricator.wikimedia.org/T192550 / > https://phabricator.wikimedia.org/T200835 > > ### Packet loss > > After noticing that exception IDs from error pages were not found in > Logstash, Tim Starling started an investigation. He created a new Grafana > dashboard and the culprit was quickly identified. Over 3000 packets were > being dropped, every second. That's over 90% of server logs – missing! > > 14 deployments, 9 SAL entries, and 6 days later, we finally reached 0% > packet loss. > > Many thanks to Filippo Giunchedi, @BBlack, @herron, @Gehel who got to the > bottom of this. > > Our weekly error numbers increased 100X since last month, and.. that's a > good thing! > > – > > https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&from=1530097200000&to=1533290400000 > – https://phabricator.wikimedia.org/T200960 > > ### Vips or no Vips > > We use the VipsScaler extension to create thumbnails of large TIFF and PNG > files in some cases. Test requests for it failed with "10.2.1.21 port 80: > Connection refused". The error was puzzling because the IP does not belong > to MediaWiki or an image-scaling service. Rather, it belongs to Proton, a > Chromium PDF service. > > Investigation from @MoritzMuehlenhoff, @Reedy, and others revealed the > service IP used by Proton since June 2018 previously belonged to the > mediawiki-imagescaler pool (dissolved in April 2018). Configuration for > VipsScaler was outdated and stopped working in April. The issue was not > noticed until the IP address started working again, with an unrelated > service producing errors. > > – https://phabricator.wikimedia.org/T199937 / > https://phabricator.wikimedia.org/T199938 > > ## Higher impact > > These cause users (of web or api) to see errors. > > New: > * [ProofreadPage extension] https://phabricator.wikimedia.org/T201506 - > MWContentSerializationException: The serialization is an invalid JSON > array. > * [Flow extension] https://phabricator.wikimedia.org/T201654 - > InvalidArgumentException > "The Title object yields no ID" from Flow\LinksTableUpdater. > * [MediaWiki-Logging] https://phabricator.wikimedia.org/T201411 - Date > input on Special:Log can cause fatal error. > > Carried over: > * [Page deletion] https://phabricator.wikimedia.org/T195692 - Undelete for > certain pages aborted by IncompleteRevisionException. > * [AbuseFilter extension] https://phabricator.wikimedia.org/T187153 - > Special:Abuselog throws BadMethodCallException on details/examine. > * [Flow extension] https://phabricator.wikimedia.org/T70526 - > InvalidDataException "Flow workflow is for different page". > * [MobileFrontend] https://phabricator.wikimedia.org/T199066 - > Special:MobileContributions shows "Special:Badtitle" (Revision::ensureTitle > error). > > ## Noise > > These are caused by code behaving unexpectedly, but with limited impact due > to graceful recovery by PHP, or other handling. These harm our ability to > detect and prevent higher impact issues (through Scap and Fatal-Monitor), > and may be masking other issues. > > New: > * [FileImporter extension] https://phabricator.wikimedia.org/T200837 - PHP > Notice: Undefined index from WikiTextContentCleaner.php. > * [PagedTiffHandler] https://phabricator.wikimedia.org/T200839 - PHP > Notice: Undefined index from PagedTiffHandler_body.php. > > Carried over: None! > > All of last month's noise mentions were fixed! 🎉 > > ## Thank you > > Thank you to everyone for helping investigate/resolve > #Wikimedia-log-errors. > > Including: > > * Jdforrester-WMF (James D. Forrester) > * matmarex (Bartosz Dziewoński) > * Marostegui (Manuel Aróstegui) > * zeljkofilipin (Željko Filipin) > * Ebe123 (Étienne Beaulé) > * jcrespo (Jaime Crespo) > * dcausse (David Causse) > * Jdlrobson (Jon Robson) > * Addshore (Adam_WMDE) > * EBjune (Erika Bjune) > * Anomie (Brad Jorsch) > * Aaron (Aaron Schulz) > * Reedy (Sam Reed) > > Thanks! > > Until next time, > -- Timo Tijhof > > > [1] > > https://wikitech.wikimedia.org/w/index.php?title=Category:Incident_documentation&pagefrom=Incident+documentation%2F20180719 > [2] https://phabricator.wikimedia.org/maniphest/query/h1j5IXlqAUPJ/#R > [3] https://phabricator.wikimedia.org/maniphest/query/MtotJEtlSU5_/#R > _______________________________________________ > Wikitech-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitech-l _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
