Thanks for the summary. :)

Pine
( https://meta.wikimedia.org/wiki/User:Pine )


On Fri, Aug 10, 2018 at 1:43 AM Krinkle <[email protected]> wrote:

> How did we do in our strive for operational excellence since last month?
> Read on to find out!
>
> ## The month in numbers
>
> * 2 documented incidents since July 19. [1]
> * 55 Wikimedia-log-errors tasks closed after July 19. [2]
> * 31 Wikimedia-log-errors tasks created after July 19. [3]
>
> Logstash (type=mediawiki, last 7 days):
> * 2,048 fatals. (channel=fatal)
> * 117,372 exceptions. (channel=exception)
> * 21,043 PHP errors.  (channel=error)
> * 6,368,647 total error-level events. (channel=*, level=ERROR)
>
> ## Highlights
>
> ### New database partition
>
> @Josve05a reported that Special:Log was timing out on
> commons.wikimedia.org
> for certain queries. Database administrator @Marostegui, investigated the
> underlying query and found out this was caused by one of the backend
> database servers having an unpartitioned 'logging' table. Manuel took the
> server out of rotation for re-partitioning, which was completed later that
> day.
>
> – https://phabricator.wikimedia.org/T199790
>
> ### Disappearing audio players, mystery solved
>
> When Étienne Beaulé (@Ebe123) found PHP-Notice errors in the Score
> extension, they immediately investigated. It began as the fixing of a typo
> that caused
> inefficient (but working) parsing of audio data. Upon closer inspection, a
> bigger story was uncovered. The computation of audio lengths was being
> skipped due to a mismatch in MIME-types between Score and
> TimedMediaHandler. The player needs this length, and as a result, browsers
> had to download and parse the audio data entirely client-side, creating a
> delay of 5-20 seconds or more.
>
> Four months earlier, Andre reported that pressing play on an audio player,
> made the player disappear for a long time.
> It all makes sense now.
>
> – https://phabricator.wikimedia.org/T192550 /
> https://phabricator.wikimedia.org/T200835
>
> ### Packet loss
>
> After noticing that exception IDs from error pages were not found in
> Logstash, Tim Starling started an investigation. He created a new Grafana
> dashboard and the culprit was quickly identified. Over 3000 packets were
> being dropped, every second. That's over 90% of server logs – missing!
>
> 14 deployments, 9 SAL entries, and 6 days later, we finally reached 0%
> packet loss.
>
> Many thanks to Filippo Giunchedi, @BBlack, @herron, @Gehel who got to the
> bottom of this.
>
> Our weekly error numbers increased 100X since last month, and.. that's a
> good thing!
>
> –
>
> https://grafana.wikimedia.org/dashboard/db/logstash?orgId=1&from=1530097200000&to=1533290400000
> – https://phabricator.wikimedia.org/T200960
>
> ### Vips or no Vips
>
> We use the VipsScaler extension to create thumbnails of large TIFF and PNG
> files in some cases. Test requests for it failed with "10.2.1.21 port 80:
> Connection refused".  The error was puzzling because the IP does not belong
> to MediaWiki or an image-scaling service. Rather, it belongs to Proton, a
> Chromium PDF service.
>
> Investigation from @MoritzMuehlenhoff, @Reedy, and others revealed the
> service IP used by Proton since June 2018 previously belonged to the
> mediawiki-imagescaler pool (dissolved in April 2018). Configuration for
> VipsScaler was outdated and stopped working in April. The issue was not
> noticed until the IP address started working again, with an unrelated
> service producing errors.
>
> – https://phabricator.wikimedia.org/T199937 /
> https://phabricator.wikimedia.org/T199938
>
> ## Higher impact
>
> These cause users (of web or api) to see errors.
>
> New:
> * [ProofreadPage extension] https://phabricator.wikimedia.org/T201506 -
> MWContentSerializationException: The serialization is an invalid JSON
> array.
> * [Flow extension] https://phabricator.wikimedia.org/T201654 -
> InvalidArgumentException
> "The Title object yields no ID" from Flow\LinksTableUpdater.
> * [MediaWiki-Logging] https://phabricator.wikimedia.org/T201411 - Date
> input on Special:Log can cause fatal error.
>
> Carried over:
> * [Page deletion] https://phabricator.wikimedia.org/T195692 - Undelete for
> certain pages aborted by IncompleteRevisionException.
> * [AbuseFilter extension] https://phabricator.wikimedia.org/T187153 -
> Special:Abuselog throws BadMethodCallException on details/examine.
> * [Flow extension] https://phabricator.wikimedia.org/T70526 -
> InvalidDataException "Flow workflow is for different page".
> * [MobileFrontend] https://phabricator.wikimedia.org/T199066 -
> Special:MobileContributions shows "Special:Badtitle" (Revision::ensureTitle
> error).
>
> ## Noise
>
> These are caused by code behaving unexpectedly, but with limited impact due
> to graceful recovery by PHP, or other handling. These harm our ability to
> detect and prevent higher impact issues (through Scap and Fatal-Monitor),
> and may be masking other issues.
>
> New:
> * [FileImporter extension] https://phabricator.wikimedia.org/T200837 - PHP
> Notice: Undefined index from WikiTextContentCleaner.php.
> * [PagedTiffHandler] https://phabricator.wikimedia.org/T200839 - PHP
> Notice: Undefined index from PagedTiffHandler_body.php.
>
> Carried over: None!
>
> All of last month's noise mentions were fixed! 🎉
>
> ## Thank you
>
> Thank you to everyone for helping investigate/resolve
> #Wikimedia-log-errors.
>
> Including:
>
> * Jdforrester-WMF (James D. Forrester)
> * matmarex (Bartosz Dziewoński)
> * Marostegui (Manuel Aróstegui)
> * zeljkofilipin (Željko Filipin)
> * Ebe123 (Étienne Beaulé)
> * jcrespo (Jaime Crespo)
> * dcausse (David Causse)
> * Jdlrobson (Jon Robson)
> * Addshore (Adam_WMDE)
> * EBjune (Erika Bjune)
> * Anomie (Brad Jorsch)
> * Aaron (Aaron Schulz)
> * Reedy (Sam Reed)
>
> Thanks!
>
> Until next time,
> -- Timo Tijhof
>
>
> [1]
>
> https://wikitech.wikimedia.org/w/index.php?title=Category:Incident_documentation&pagefrom=Incident+documentation%2F20180719
> [2] https://phabricator.wikimedia.org/maniphest/query/h1j5IXlqAUPJ/#R
> [3] https://phabricator.wikimedia.org/maniphest/query/MtotJEtlSU5_/#R
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to