Just wanted to emphasize that this is a great effort, and a huge step towards 
improving the current reliability of our
services.
We should do more of this, broader and more exhaustive.

Kudos!

On 06/28 12:33, Kunal Mehta wrote:
> Hi,
> 
> Today we switched over most services and traffic caches from the eqiad
> (Virginia) datacenter to codfw (Texas) as part of improving our reliability.
> The goal is to have this procedure working and regularly tested in case of
> an emergency when we actually need it.
> 
> We're only aware of one user-facing impact, for a short time WDQS lag
> detection was broken, affecting Wikidata bots that check it. This is tracked
> as <https://phabricator.wikimedia.org/T285710>.
> 
> Users will experience a bit of a latency increase for now as most user
> traffic will need to talk to both eqiad and codfw datacenters. This will go
> away tomorrow once MediaWiki is switched over (keep reading).
> 
> Also, we were a bit delayed in starting today because of an issue causing
> appservers to get stuck: <https://phabricator.wikimedia.org/T285634>.
> 
> == Services ==
> Started at 14:29 UTC, officially finished at 15:09.
> 
> The main issues we ran into were:
> * the helm-charts service is unique and doesn't have a service IP, causing
> the automatic switchover verification to break. This required us to manually
> check the other services that come after it in the list, and then re-run
> cookbook while excluding it. Tracked as
> <https://phabricator.wikimedia.org/T285707>.
> * the restbase-async service has some special handling, which we debated on
> whether to follow that or not, opted to not special case it. Figuring out
> what to do long-term is <https://phabricator.wikimedia.org/T285711>.
> * the WDQS issue mentioned earlier.
> 
> == Traffic ==
> Started at 15:43, finished at 15:45.
> 
> It took until ~16:25 for eqiad to mostly depool. There's not much else to
> report, it went very smoothly.
> 
> == Tomorrow's MediaWiki switchover ==
> Scheduled for 14:00 UTC <https://zonestamp.toolforge.org/1624888854>.
> 
> It is our goal to minimize the read-only time and make this a non-event from
> a user perspective.
> 
> All of the coordination will take place in the #wikimedia-operations IRC
> channel on Libera Chat You're more than welcome to follow along but if you
> have questions, please ask them in #wikimedia-tech so it doesn't get
> disruptive. The procedure that we'll be following is documented at
> <https://wikitech.wikimedia.org/wiki/Switch_Datacenter#MediaWiki>.
> 
> I'm planning to do one more "live test" later today, will announce that on
> IRC when it gets started.
> 
> -- Kunal
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

-- 
David Caro
SRE - Cloud Services
Wikimedia Foundation <https://wikimediafoundation.org/>
PGP Signature: 7180 83A2 AC8B 314F B4CE  1171 4071 C7E1 D262 69C3

"Imagine a world in which every single human being can freely share in the
sum of all knowledge. That's our commitment."

Attachment: signature.asc
Description: PGP signature

_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to