Specifically:

if we measure read only time as "an editor can't start an edit because
wikis are read only", then the read-only time is 119s;
if we measure it by the last timestamp of an edit being saved, that's  94
seconds.

As Amir explained, we leave some room for propagation of the MediaWiki
read-only mode (about 10-15 seconds) and for in-flight edits (another 10
seconds) before we set the databases to read-only as well.

I think 2 minutes of read-only for such a complex operation are the good
balance between reasonable change safety and reduction of impact; we could
reduce the read-only time by another 10-20 seconds with some more
aggressive moves (like clearing the DNS recursor caches) but I don't think
there's a big value there at this point.

I'll add: if anyone is interested in knowing more and they're coming to the
hackathon, I'll be happy to make an impromptu session about how we handle
this procedure.

Cheers,

Giuseppe

On Wed, Mar 1, 2023 at 6:16 PM Amir Sarabadani <ladsgr...@gmail.com> wrote:

> It's a bit complicated.
> When SRE sets the read-only mark, they start counting from that time and
> it starts propagating which takes a while to be actually shown to all users
> but some users might still see the RO error while some actual writes are
> happening somewhere else because the cache is not invalidated yet (I think
> it has a TTL of 5 seconds but I need to double check). We still consider
> that as RO time because it's affecting users regardless.
>
> HTH
>
>
>
> Am Mi., 1. März 2023 um 18:06 Uhr schrieb Dušan Kreheľ <
> dusankre...@gmail.com>:
>
>> Clément Goubert and everybody,
>>
>> I analyzed https://stream.wikimedia.org/v2/stream/recentchange and i
>> have the another results.
>>
>> Last change (before migration): 2023-03-01T14:00:30
>> First change (after migration):    2023-03-01T14:02:05
>> Result: Down time (14:00:31 to 14:02:05) is 94s.
>>
>> I think that analysis is more authoritative. I think it analyzes based
>> on something like REQUEST_TIME in PHP.
>>
>> Dušan Kreheľ
>>
>>
>> 2023-03-01 16:30 GMT+01:00, Clément Goubert <cgoub...@wikimedia.org>:
>> > Dear Wikitechians,
>> >
>> > Dear colleagues,
>> >
>> > The switchover process requires a *brief read-only period for all
>> > Foundation-hosted wikis*, which started at *14:00 UTC on Wednesday March
>> > 1st*, and lasted *119 seconds*. All our public and private wikis
>> continued
>> > to be available for reading as usual. Users saw a notification of the
>> > upcoming maintenance, and anyone still editing was asked to try again
>> in a
>> > few minutes.
>> >
>> > As a side note, with other SREs we have been trying to discern the
>> effect
>> > of the Switchover in many of the graphs we have to monitor the
>> > infrastructure in https://grafana.wikimedia.org during Switchover. In
>> many,
>> > it's impossible to tell the event. The most discernible graph we have
>> is of
>> > the edit rate, which can be viewed here: Grafana
>> > <
>> https://grafana-rw.wikimedia.org/d/000000208/edit-count?from=1677673800000&orgId=1&to=1677681000000
>> >.
>> > Can you spot it? See the attached picture to help:
>> >
>> > I am extending thanks to everyone that was also present on IRC, helping
>> out
>> > in any way that they could. Thanks as well to Community Relations who
>> > notified communities of the read-only window ahead of time. And thanks
>> to
>> > everyone that contributed to MultiDC
>> > <https://wikitech.wikimedia.org/wiki/Performance/Multi-DC_MediaWiki>,
>> > especially Performance for pushing forward with the last parts of it,
>> > allowing us to perform this Switchover faster and with more confidence
>> than
>> > ever before.
>> >
>> > If you wanna relive through the Switchover, here's a link to a capture
>> > of Listen
>> > to Wikipedia <https://en.wikipedia.org/wiki/Listen_to_Wikipedia>
>> during the
>> > Switchover: Listen to the Switchover
>> > <
>> https://drive.google.com/file/d/1jqQUVCq3ksjOM5bKoIfCZ5Zt9RRW1Nl_/view?usp=share_link
>> >
>> > (spoiler:
>> > the part with no sounds is the switchover)
>> >
>> > A similar event will follow a few weeks later, when we move back to
>> > Virginia. This is currently scheduled for *Wednesday, April 26th*.
>> > Thank you,
>> >
>> > On Tue, Feb 21, 2023 at 1:55 PM Clément Goubert <cgoub...@wikimedia.org
>> >
>> > wrote:
>> >
>> >> Dear Wikitechians,
>> >>
>> >> I would like to remind you that the datacenter switchover will happen
>> on
>> >> *Wednesday
>> >> March 1st* starting at *14:00 UTC.*
>> >>
>> >> Please refer to the original email for any additional information. As
>> >> always, you can reach out to me directly or the SRE team in
>> >> #wikimedia-sre
>> >> on IRC with any question, or through Phabricator.
>> >>
>> >> Thank you,
>> >>
>> >> On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert <
>> cgoub...@wikimedia.org>
>> >> wrote:
>> >>
>> >>> Dear Wikitechians,
>> >>>
>> >>> On *Wednesday March 1st*, the SRE team will run a planned data center
>> >>> switchover, moving all wikis from our primary data center in Virginia
>> to
>> >>> the secondary data center in Texas. This is an important periodic test
>> >>> of
>> >>> our tools and procedures, to ensure the wikis will continue to be
>> >>> available
>> >>> even in the event of major technical issues in our primary home. It
>> also
>> >>> gives all our SRE and ops teams a chance to do maintenance and
>> upgrades
>> >>> on
>> >>> systems in Virginia that normally run 24 hours a day.
>> >>>
>> >>> The switchover process requires a *brief read-only period for all
>> >>> Foundation-hosted wikis*, which will start at *14:00 UTC on Wednesday
>> >>> March 1st*, and will last for a few minutes while we execute the
>> >>> migration as efficiently as possible. All our public and private wikis
>> >>> will
>> >>> be continuously available for reading as usual, but no one will be
>> able
>> >>> to
>> >>> save edits during the process. Users will see a notification of the
>> >>> upcoming maintenance, and anyone still editing will be asked to try
>> >>> again
>> >>> in a few minutes.
>> >>>
>> >>> CommRel has already begun notifying communities of the read-only
>> window.
>> >>> A similar event will follow a few weeks later, when we move back to
>> >>> Virginia. This is currently scheduled for *Wednesday, April 26th*.
>> >>>
>> >>> If you like, you can follow along on the day in the public
>> >>> #wikimedia-operations channel on IRC (instructions for joining here
>> >>> <https://meta.wikimedia.org/wiki/IRC/Instructions>). To report any
>> >>> issues, you can reach us in #wikimedia-sre on IRC, or file a
>> Phabricator
>> >>> ticket with the *datacenter-switchover* tag (pre-filled form here
>> >>> <
>> https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datacenter-Switchover&subscribers=Clement_Goubert
>> >);
>> >>> we'll be monitoring closely for reports of trouble during and after
>> the
>> >>> switchover. (If you're new to Phab, there's more information at
>> >>> Phabricator/Help.) The switchover and its preparation are tracked
>> >>> tracked in Phabricator Task T327920
>> >>> <https://phabricator.wikimedia.org/T327920>
>> >>>
>> >>> On behalf of the SRE team, please excuse the disruption, and our
>> thanks
>> >>> to everyone in a number of departments who've been involved in
>> planning
>> >>> this work for the past weeks. Feel free to reply directly to me with
>> any
>> >>> questions.
>> >>>
>> >>> Thank you,
>> >>>
>> >>> --
>> >>> Clément Goubert (they/them)
>> >>> Senior SRE
>> >>> Wikimedia Foundation
>> >>>
>> >>
>> >>
>> >> --
>> >> Clément Goubert (they/them)
>> >> Senior SRE
>> >> Wikimedia Foundation
>> >>
>> >
>> >
>> > --
>> > Clément Goubert (they/them)
>> > Senior SRE
>> > Wikimedia Foundation
>> >
>> _______________________________________________
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
>
> --
> Amir (he/him)
>
> _______________________________________________
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Reply via email to