Just wanted to emphasize that this is a great effort, and a huge step towards improving the current reliability of our services. We should do more of this, broader and more exhaustive.
Kudos! On 06/28 12:33, Kunal Mehta wrote: > Hi, > > Today we switched over most services and traffic caches from the eqiad > (Virginia) datacenter to codfw (Texas) as part of improving our reliability. > The goal is to have this procedure working and regularly tested in case of > an emergency when we actually need it. > > We're only aware of one user-facing impact, for a short time WDQS lag > detection was broken, affecting Wikidata bots that check it. This is tracked > as <https://phabricator.wikimedia.org/T285710>. > > Users will experience a bit of a latency increase for now as most user > traffic will need to talk to both eqiad and codfw datacenters. This will go > away tomorrow once MediaWiki is switched over (keep reading). > > Also, we were a bit delayed in starting today because of an issue causing > appservers to get stuck: <https://phabricator.wikimedia.org/T285634>. > > == Services == > Started at 14:29 UTC, officially finished at 15:09. > > The main issues we ran into were: > * the helm-charts service is unique and doesn't have a service IP, causing > the automatic switchover verification to break. This required us to manually > check the other services that come after it in the list, and then re-run > cookbook while excluding it. Tracked as > <https://phabricator.wikimedia.org/T285707>. > * the restbase-async service has some special handling, which we debated on > whether to follow that or not, opted to not special case it. Figuring out > what to do long-term is <https://phabricator.wikimedia.org/T285711>. > * the WDQS issue mentioned earlier. > > == Traffic == > Started at 15:43, finished at 15:45. > > It took until ~16:25 for eqiad to mostly depool. There's not much else to > report, it went very smoothly. > > == Tomorrow's MediaWiki switchover == > Scheduled for 14:00 UTC <https://zonestamp.toolforge.org/1624888854>. > > It is our goal to minimize the read-only time and make this a non-event from > a user perspective. > > All of the coordination will take place in the #wikimedia-operations IRC > channel on Libera Chat You're more than welcome to follow along but if you > have questions, please ask them in #wikimedia-tech so it doesn't get > disruptive. The procedure that we'll be following is documented at > <https://wikitech.wikimedia.org/wiki/Switch_Datacenter#MediaWiki>. > > I'm planning to do one more "live test" later today, will announce that on > IRC when it gets started. > > -- Kunal > _______________________________________________ > Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org > To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org > https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ -- David Caro SRE - Cloud Services Wikimedia Foundation <https://wikimediafoundation.org/> PGP Signature: 7180 83A2 AC8B 314F B4CE 1171 4071 C7E1 D262 69C3 "Imagine a world in which every single human being can freely share in the sum of all knowledge. That's our commitment."
signature.asc
Description: PGP signature
_______________________________________________ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/