[Wikitech-l] Production Excellence #44: May 2022

2022-06-15 Thread Krinkle
How’d we do in our strive for operational excellence last month? Read on to 
find out!

Incidents 
By golly, we've had quite the month! 10 documented incidents, which is more 
than three times the two-year median of 3. The last time we experienced ten or 
more incidents in one month, was June 2019 when we had eleven (Incident graphs 
, Excellence monthly of June 2019 
).

I'd like to draw your attention to something positive. As you read the below, 
take note of incidents that did *not* impact public services, and did *not* 
have lasting impact or data loss. For example, the Apache incident 

 benefited from PyBal's automatic health-based depooling. The deployment server 
incident  
recovered without loss thanks to Bacula. The Etcd incident 
 impact was 
limited by serving stale data. And, the Hadoop incident 

 recovered by resuming from Kafka right where it left off.

2022-05-01 etcd 
Impact: For 2 hours, Conftool could not sync Etcd data between our core data 
centers. Puppet and some other internal services were unavailable or out of 
sync. The issue was isolated, with no impact on public services.

2022-05-02 deployment server 

Impact: For 4 hours, we could not update or deploy MediaWiki and other 
services, due to corruption on the active deployment server. No impact on 
public services.

2022-05-05 site outage 

Impact: For 20 minutes, all wikis were unreachable for logged-in users and 
non-cached pages. This was due to a GlobalBlocks schema change causing 
significant slowdown in a frequent database query.

2022-05-09 Codfw confctl 

Impact: For 5 minutes, all web traffic routed to Codfw received error 
responses. This affected central USA and South America (local time after 
midnight). The cause was human error and lack of CLI parameter validation.

2022-05-09 exim-bdat-errors 

Impact: During five days, about 14,000 incoming emails from Gmail users to 
wikimedia.org were rejected and returned to sender.

2022-05-21 varnish cache busting 

Impact: For 2 minutes, all wikis and services behind our CDN were unavailable 
to all users.

2022-05-24 failed Apache restart 

Impact: For 35 minutes, numerous internal services that use Apache on the 
backend were down. This included Kibana (logstash) and Matomo (piwik). For 20 
of those minutes, there was also reduced MediaWiki server capacity, but no 
measurable end-user impact for wiki traffic.

2022-05-25 de.wikipedia.org 

Impact: For 6 minutes, a portion of logged-in users and non-cached pages 
experienced a slower response or an error. This was due to increased load on 
one of the databases.

2022-05-26 m1 database hardware 

Impact: For 12 minutes, internal services hosted on the m1 database (e.g. 
Etherpad) were unavailable or at reduced capacity.

2022-05-31 Analytics Hadoop failure 

Impact: For 1 hour, all HDFS writes and reads were failing. After recovery, 
ingestion from Kafka resumed and caught up. No data loss or other lasting 
impact on the Data Lake.

Incident follow-up 
Recently completed incident follow-up:

Invalid confctl selector should either error out or select nothing 

Filed by Amir (@Ladsgroup ) 
after the confctl incident this past month. Giuseppe (@Joe 
) implemented CLI parameter 
validation to prevent human error from causing a similar outage in the future.

Backup opensearch dashboards data 
Filed back in 2019 by Filippo (@fgiunchedi 
). The OpenSearch homepage 
dashboard (at logstash.wikimedia.org) was accidentally deleted last month. 
Bryan (@bd808 ) 

[Wikitech-l] Re: [engineering-all] Phabricator maintenance tomorrow 15:00 UTC (08:00 PDT)

2022-06-15 Thread Brennen Bearnes

On 6/14/22 18:21, Tyler Cipriani wrote:

Phabricator (phabricator.wikimedia.org) will be down tomorrow sometime
during the 15:00–17:00 UTC[0] maintenance window[1].

We're deploying updates during that time, and we'll stop Phabricator
briefly to run database migrations.


This is completed - please let us know in #wikimedia-releng or file a 
task if you encounter any bugs.


--
Brennen Bearnes
Release Engineering
Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: [Wikimedia-l] Developer Portal is launched! Discover Wikimedia’s technical areas and how to contribute

2022-06-15 Thread Samuel Klein
<3  Very pretty. One step closer to T26070 ...
Nice to see global site search  and
codesearch  featured prominently;
more people should know about them.

On Wed, Jun 15, 2022 at 10:07 AM Birgit Müller 
wrote:

> Hi All,
>
> We are happy to announce the launch of the Wikimedia Developer Portal
>  - a centralized entry point for
> finding technical documentation and community resources across Wikimedia’s
> key technical areas.
>
>
> The Developer Portal project
>  is
> part of a broader initiative to improve the discoverability and overall
> quality of key technical documentation. Work on this project began in
> July 2020 and included research and feedback rounds
> ,
> designing a content strategy
> 
> and user journeys
> 
> to help people navigate and achieve their tasks, technical implementation
> 
> of the portal, developing a documentation review process
> , and
> updating key documents.
>
>
> Wikimedia’s technical documentation spans a wide range of technologies, is
> distributed (and often duplicated) across mediawiki
> , wikitech
> , code repositories or other places,
> might or might not be up-to-date, and serves multiple audiences.
>
>
> This complex landscape can make it hard to find the information you need.
> The goal of the Developer Portal is to make it easier for developers and
> other technical contributors to:
>
>-
>
>Find the key documentation they need for common developer tasks.
>-
>
>Discover available tools and technologies.
>-
>
>Learn how to get started in Wikimedia technical areas.
>
> At its core, the Developer Portal is a navigation tool. It is an index of
> categorized links to key sources of technical information. These sources
> are hosted primarily on wikis—the portal itself contains no actual
> documentation.
>
> A major part of this project includes reviewing and updating the documents
> linked from the Developer Portal - the actual documentation (examples:
> Localisation , Communication
> , Cloud Services
> introduction
> ).
> This work will continue over the next year while also investigating how to
> improve and scale the process.
>
>
> The Developer Portal is a project by the Developer Advocacy team within
> the Technical Engagement group at the Wikimedia Foundation and has been
> developed by a project team of technical writers, engineers, and developer
> advocates (Alex ,
> Andre , Bryan
> , Sarah R
> , Tricia
> ).
>
>
> The long term goal is to step-by-step move towards a future where
> Wikimedia’s technical documentation is discoverable, accurate,
> standardized, and continuously updated. Yes, this is big :-) — and yes,
> it will take a while. Hope you join us in realizing this collective effort
> of making it easier for people to contribute to the code and technical
> spaces.
>
>
> Thank you <3
>
>
> This project would not have been possible without the support, knowledge,
> ideas and feedback of many! A huge thanks to:
>
>-
>
>the members of WMF and WMDE engineering teams who participated in
>exploratory interviews early in the process,
>-
>
>everyone who provided feedback at Hackathon 2021 & 2022, via private
>messages, on the Project page
>
> 
>or gave input or help on Phabricator tasks
>
>-
>
>Developers and community members who participated in multiple rounds
>of user testing
>
> 
>-
>
>the Design Strategy team
> at the Foundation
>for helping us with designing a research study and recruiting a diverse
>group of users to test the final version of the site
>-
>
>people who helped with key 

[Wikitech-l] Developer Portal is launched! Discover Wikimedia’s technical areas and how to contribute

2022-06-15 Thread Birgit Müller
Hi All,

We are happy to announce the launch of the Wikimedia Developer Portal
 - a centralized entry point for finding
technical documentation and community resources across Wikimedia’s key
technical areas.


The Developer Portal project
 is
part of a broader initiative to improve the discoverability and overall
quality of key technical documentation. Work on this project began in July
2020 and included research and feedback rounds
,
designing a content strategy

and user journeys

to help people navigate and achieve their tasks, technical implementation

of the portal, developing a documentation review process
, and
updating key documents.


Wikimedia’s technical documentation spans a wide range of technologies, is
distributed (and often duplicated) across mediawiki
, wikitech
, code repositories or other places, might
or might not be up-to-date, and serves multiple audiences.


This complex landscape can make it hard to find the information you need.
The goal of the Developer Portal is to make it easier for developers and
other technical contributors to:

   -

   Find the key documentation they need for common developer tasks.
   -

   Discover available tools and technologies.
   -

   Learn how to get started in Wikimedia technical areas.

At its core, the Developer Portal is a navigation tool. It is an index of
categorized links to key sources of technical information. These sources
are hosted primarily on wikis—the portal itself contains no actual
documentation.

A major part of this project includes reviewing and updating the documents
linked from the Developer Portal - the actual documentation (examples:
Localisation , Communication
, Cloud Services introduction
).
This work will continue over the next year while also investigating how to
improve and scale the process.


The Developer Portal is a project by the Developer Advocacy team within the
Technical Engagement group at the Wikimedia Foundation and has been
developed by a project team of technical writers, engineers, and developer
advocates (Alex ,
Andre , Bryan
, Sarah R
, Tricia
).


The long term goal is to step-by-step move towards a future where
Wikimedia’s technical documentation is discoverable, accurate,
standardized, and continuously updated. Yes, this is big :-) — and yes, it
will take a while. Hope you join us in realizing this collective effort of
making it easier for people to contribute to the code and technical spaces.


Thank you <3


This project would not have been possible without the support, knowledge,
ideas and feedback of many! A huge thanks to:

   -

   the members of WMF and WMDE engineering teams who participated in
   exploratory interviews early in the process,
   -

   everyone who provided feedback at Hackathon 2021 & 2022, via private
   messages, on the Project page
   

   or gave input or help on Phabricator tasks
   
   -

   Developers and community members who participated in multiple rounds of
   user testing
   

   -

   the Design Strategy team
    at the Foundation for
   helping us with designing a research study and recruiting a diverse group
   of users to test the final version of the site
   -

   people who helped with key doc improvements and content reviews: Haley
   , Kamil
   , Komla
   , Nick
   , Srishti
   
   -

   translators and translatewiki.net community, especially Abijeet
   , Niklas