Starting maintenance. On Wed, Oct 1, 2025 at 11:54 AM Clément Goubert <[email protected]> wrote:
> Hello everyone, > > An update on the status of Maps for the upcoming upgrade. > > > *Short version:* > > > *Maps will serve some stale map tiles for the next few hours.* > *Rationale:* > The OSM map tile cache is still being refreshed, there are a lot of > elements to fetch and we couldn't make that happen before the upgrade. This > refresh will keep happening during the migration, so the amount of stale > tiles served will go down as time passes. We decided this was the best of > the three options available to us, the other two being depooling the > service entirely and having maps be unavailable for the duration of the > maintenance, and pushing the date of the upgrade in the future, which would > snowball into pushing back the eqiad repool. > > --- > > Object: Kubernetes upgrade to 1.31 > > Target: eqiad Wikikube cluster > > Maintenance window: 2025-10-01 10:00 > <https://zonestamp.toolforge.org/1759312800>-15:00 > <https://zonestamp.toolforge.org/1759330800> UTC > > Tracking task: Phabricator at ⚓T405703 Update wikikube eqiad to > kubernetes 1.31 <https://phabricator.wikimedia.org/T405703> > > Operational channel: IRC #wikimedia-sre > <https://web.libera.chat/gamja/?nick=Guest#wikimedia-sre>, announcements > will be made to IRC #wikimedia-operations > <https://web.libera.chat/gamja/?nick=Guest#wikimedia-operations> > > Operating team: SRE ServiceOps (contact IRC #wikimedia-serviceops > <https://web.libera.chat/gamja/?nick=Guest#wikimedia-serviceops>) > > Impact: > > Users: > > - > > Toolhub will be down for the duration of the window. > - > > Maps may experience some perturbation during this maintenance, most > probably serving stale map tiles while the cache is being refreshed. > > > - > > No user impact for other services > > Deployers: > > - > > Deployments to the target cluster will be unavailable. This includes > MediaWiki backports and deployments. DO NOT DEPLOY. > - > > The following deployment windows are cancelled: > - > > Services: Citoid/Zotero 11:00 UTC > <https://zonestamp.toolforge.org/1759316400> > - > > UTC Afternoon Backport Window 13:00 UTC > <https://zonestamp.toolforge.org/1759330800> > - > > Wikifunctions Services UTC Afternoon 14:00 UTC > <https://zonestamp.toolforge.org/1759327200> > > Process: > > All steps handled by SRE ServiceOps > > - > > Maintenance start is announced on #wikimedia-operations and as reply > to this email chain > - > > All deployments are stopped > - > > SRE ServiceOps ensures all current versions of deployments can be > safely deployed > - > > Maintenance begins and should take a couple of hours > - > > Maps is switched over to codfw new stack, perturbations may start > - > > Toolhub downtime starts > - > > Possible Maps fallback to codfw old stack > - > > Cluster is wiped and upgraded > - > > Maps and Toolhub are redeployed first to minimize downtime > - > > Maps is switched back to eqiad, perturbations end > - > > Toolhub downtime stops > - > > SRE ServiceOps redeploys all target cluster services > - > > Maintenance end is announced on #wikimedia-operations and as reply to > this email chain > - > > Deployments resume > > Rationale: > > The date was chosen for convenience as due to the data center switchover > process <https://wikitech.wikimedia.org/wiki/Switch_Datacenter>, eqiad is > currently fully depooled, receiving almost no traffic. eqiad is scheduled > to be repooled on 2025-10-02 <https://zonestamp.toolforge.org/1759417200>, > which would complicate the upgrade. With eqiad already drained, we expect > no visible user impact. > > SRE ServiceOps will be checking that all services can be safely deployed > before the maintenance, and will be redeploying all services before marking > the cluster as usable. Deployers are not required to re-deploy their > services, unless they have been informed to do so by SRE ServiceOps. > > During last week’s switchover <https://phabricator.wikimedia.org/T399891>, > Toolhub remained in eqiad. This means that there will be an expected > unavoidable small downtime of a few hours. To minimize Toolhub’s downtime, > we will prioritize its redeployment during the initialization phase. > > As part of the work to upgrade the Maps infrastructure > <https://phabricator.wikimedia.org/T381565> and bring the kartotherian > service to Wikikube, kartotherian is currently single-homed in eqiad > Wikikube, using the old buster-based stack as a backend. The new > bookworm-based stack in codfw is being brought up quickly, so we will use > this maintenance as an opportunity to shift traffic to it (Case 1). In > addition, we are also warming up the old buster-based stack in codfw so we > can fall back to it in case issues arise (Case 2). As of 15 minutes before > the maintenance, the OSM map tile cache is still being refreshed. There > are a lot of elements to fetch and we couldn't make that happen before the > upgrade. This refresh will keep happening during the migration, so the > amount of stale tiles served will go down as time passes. We decided this > was the best of the three options available to us, the other two being > depooling the service entirely and having maps be unavailable for the > duration of the maintenance, and pushing the date of the upgrade in the > future, which would snowball into pushing back the eqiad repool. > > Thank you for your understanding and support! If you have any questions > regarding this process, please respond to this email, comment on > Phabricator at ⚓T405703 Update wikikube eqiad to kubernetes 1.31 > <https://phabricator.wikimedia.org/T405703>, or reach out directly to me > (IRC nickname claime on #wikimedia-serviceops > <https://web.libera.chat/gamja/?nick=Guest#wikimedia-serviceops>). > > On behalf of SRE ServiceOps, > > > On Tue, Sep 30, 2025 at 4:54 PM Clément Goubert <[email protected]> > wrote: > >> Hello everyone, >> >> A quick update on additional impact for the upcoming maintenance. Fully >> updated maintenance description at the end of this email. >> >> Short version: >> >> The Maps infrastructure may experience some perturbation during this >> maintenance. >> >> Impact: >> >> Users: >> >> - >> >> Case 1: The new bookworm-based codfw stack performs well and service >> disruption should be minimal >> - >> >> Case 2: If errors are experienced with the new codfw stack, the >> fallback to the old codfw stack will come with some OSM-data lag, as yet >> unmeasurable >> >> Mitigation: >> >> - >> >> Maps will be redeployed with the same priority as Toolhub to minimize >> downtime. >> >> Rationale: >> >> As part of the work to upgrade the Maps infrastructure >> <https://phabricator.wikimedia.org/T381565> and bring the kartotherian >> service to Wikikube, kartotherian is currently single-homed in eqiad >> Wikikube, using the old buster-based stack as a backend. >> >> The new bookworm-based stack in codfw is being brought up quickly, so we >> will use this maintenance as an opportunity to shift traffic to it (case >> 1). In addition, we are also warming up the old buster-based stack in codfw >> so we can fall back to it in case issues arise (case 2). >> >> --- >> >> Object: Kubernetes upgrade to 1.31 >> >> Target: eqiad Wikikube cluster >> >> Maintenance window: 2025-10-01 10:00 >> <https://zonestamp.toolforge.org/1759312800>-15:00 >> <https://zonestamp.toolforge.org/1759330800> UTC >> >> Tracking task: Phabricator at ⚓T405703 Update wikikube eqiad to >> kubernetes 1.31 <https://phabricator.wikimedia.org/T405703> >> >> Operational channel: IRC #wikimedia-sre >> <https://web.libera.chat/gamja/?nick=Guest#wikimedia-sre>, announcements >> will be made to IRC #wikimedia-operations >> <https://web.libera.chat/gamja/?nick=Guest#wikimedia-operations> >> >> Operating team: SRE ServiceOps (contact IRC #wikimedia-serviceops >> <https://web.libera.chat/gamja/?nick=Guest#wikimedia-serviceops>) >> >> Impact: >> >> Users: >> >> - >> >> Toolhub will be down for the duration of the window. >> - >> >> Maps may experience some perturbation during this maintenance. >> >> >> - >> >> No user impact for other services >> >> Deployers: >> >> - >> >> Deployments to the target cluster will be unavailable. This includes >> MediaWiki backports and deployments. DO NOT DEPLOY. >> - >> >> The following deployment windows are cancelled: >> - >> >> Services: Citoid/Zotero 11:00 UTC >> <https://zonestamp.toolforge.org/1759316400> >> - >> >> UTC Afternoon Backport Window 13:00 UTC >> <https://zonestamp.toolforge.org/1759330800> >> - >> >> Wikifunctions Services UTC Afternoon 14:00 UTC >> <https://zonestamp.toolforge.org/1759327200> >> >> Process: >> >> All steps handled by SRE ServiceOps >> >> - >> >> Maintenance start is announced on #wikimedia-operations and as reply >> to this email chain >> - >> >> All deployments are stopped >> - >> >> SRE ServiceOps ensures all current versions of deployments can be >> safely deployed >> - >> >> Maintenance begins and should take a couple of hours >> - >> >> Maps is switched over to codfw new stack, perturbations may start >> - >> >> Toolhub downtime starts >> - >> >> Possible Maps fallback to codfw old stack >> - >> >> Cluster is wiped and upgraded >> - >> >> Maps and Toolhub are redeployed first to minimize downtime >> - >> >> Maps is switched back to eqiad, perturbations end >> - >> >> Toolhub downtime stops >> - >> >> SRE ServiceOps redeploys all target cluster services >> - >> >> Maintenance end is announced on #wikimedia-operations and as reply to >> this email chain >> - >> >> Deployments resume >> >> Rationale: >> >> The date was chosen for convenience as due to the data center switchover >> process <https://wikitech.wikimedia.org/wiki/Switch_Datacenter>, eqiad >> is currently fully depooled, receiving almost no traffic. eqiad is >> scheduled to be repooled on 2025-10-02 >> <https://zonestamp.toolforge.org/1759417200>, which would complicate the >> upgrade. With eqiad already drained, we expect no visible user impact. >> >> SRE ServiceOps will be checking that all services can be safely deployed >> before the maintenance, and will be redeploying all services before marking >> the cluster as usable. Deployers are not required to re-deploy their >> services, unless they have been informed to do so by SRE ServiceOps. >> >> During last week’s switchover <https://phabricator.wikimedia.org/T399891>, >> Toolhub remained in eqiad. This means that there will be an expected >> unavoidable small downtime of a few hours. To minimize Toolhub’s downtime, >> we will prioritize its redeployment during the initialization phase. >> >> As part of the work to upgrade the Maps infrastructure >> <https://phabricator.wikimedia.org/T381565> and bring the kartotherian >> service to Wikikube, kartotherian is currently single-homed in eqiad >> Wikikube, using the old buster-based stack as a backend. The new >> bookworm-based stack in codfw is being brought up quickly, so we will use >> this maintenance as an opportunity to shift traffic to it (Case 1). In >> addition, we are also warming up the old buster-based stack in codfw so we >> can fall back to it in case issues arise (Case 2). >> >> Thank you for your understanding and support! If you have any questions >> regarding this process, please respond to this email, comment on >> Phabricator at ⚓T405703 Update wikikube eqiad to kubernetes 1.31 >> <https://phabricator.wikimedia.org/T405703>, or reach out directly to me >> (IRC nickname claime on #wikimedia-serviceops >> <https://web.libera.chat/gamja/?nick=Guest#wikimedia-serviceops>). >> >> On behalf of SRE ServiceOps, >> >> On Mon, Sep 29, 2025 at 5:37 PM Clément Goubert <[email protected]> >> wrote: >> >>> Hello everyone, >>> >>> Short version: >>> >>> We will be upgrading the eqiad Wikikube kubernetes >>> <https://wikitech.wikimedia.org/wiki/Kubernetes/Clusters#WikiKube> >>> cluster to 1.31 on Wednesday 2025-10-01 starting at 10:00 UTC >>> <https://zonestamp.toolforge.org/1759312800>, ending at 15:00 UTC >>> <https://zonestamp.toolforge.org/1759330800>. >>> >>> Toolhub will be down during this maintenance. >>> >>> If you are deploying services to the eqiad Wikikube kubernetes cluster: >>> >>> - >>> >>> Deployments will be unavailable during the maintenance. DO NOT >>> DEPLOY. >>> - >>> >>> SRE will redeploy all services >>> - >>> >>> SRE will announce the end of maintenance, at which point the cluster >>> will be usable again >>> >>> --- >>> >>> Object: Kubernetes upgrade to 1.31 >>> >>> Target: eqiad Wikikube cluster >>> >>> Maintenance window: 2025-10-01 10:00 >>> <https://zonestamp.toolforge.org/1759312800>-15:00 >>> <https://zonestamp.toolforge.org/1759330800> UTC >>> >>> Tracking task: Phabricator at ⚓T405703 Update wikikube eqiad to >>> kubernetes 1.31 <https://phabricator.wikimedia.org/T405703> >>> >>> Operational channel: IRC #wikimedia-sre >>> <https://web.libera.chat/gamja/?nick=Guest#wikimedia-sre>, >>> announcements will be made to IRC #wikimedia-operations >>> <https://web.libera.chat/gamja/?nick=Guest#wikimedia-operations> >>> >>> Operating team: SRE ServiceOps (contact IRC #wikimedia-serviceops >>> <https://web.libera.chat/gamja/?nick=Guest#wikimedia-serviceops>) >>> >>> Impact: >>> >>> Users: >>> >>> - >>> >>> Toolhub will be down for the duration of the window. >>> - >>> >>> No user impact for other services. >>> >>> Deployers: >>> >>> - >>> >>> Deployments to the target cluster will be unavailable. This includes >>> MediaWiki backports and deployments. DO NOT DEPLOY. >>> - >>> >>> The following deployment windows are cancelled: >>> - >>> >>> Services: Citoid/Zotero 11:00 UTC >>> <https://zonestamp.toolforge.org/1759316400> >>> - >>> >>> UTC Afternoon Backport Window 13:00 UTC >>> <https://zonestamp.toolforge.org/1759330800> >>> - >>> >>> Wikifunctions Services UTC Afternoon 14:00 UTC >>> <https://zonestamp.toolforge.org/1759327200> >>> >>> Process: >>> >>> All steps handled by SRE ServiceOps >>> >>> - >>> >>> Maintenance start is announced on #wikimedia-operations and as reply >>> to this email chain >>> - >>> >>> All deployments are stopped >>> - >>> >>> SRE ServiceOps ensures all current versions of deployments can be >>> safely deployed >>> - >>> >>> Maintenance begins and should take a couple of hours >>> - >>> >>> Toolhub downtime starts >>> - >>> >>> Cluster is wiped and upgraded >>> - >>> >>> Toolhub is redeployed first to minimize downtime >>> - >>> >>> Toolhub downtime stops >>> - >>> >>> SRE ServiceOps redeploys all target cluster services >>> - >>> >>> Maintenance end is announced on #wikimedia-operations and as reply >>> to this email chain >>> - >>> >>> Deployments resume >>> >>> Rationale: >>> >>> The date was chosen for convenience as due to the data center >>> switchover process >>> <https://wikitech.wikimedia.org/wiki/Switch_Datacenter>, eqiad is >>> currently fully depooled, receiving almost no traffic. eqiad is scheduled >>> to be repooled on 2025-10-02 >>> <https://zonestamp.toolforge.org/1759417200>, which would complicate >>> the upgrade. With eqiad already drained, we expect no visible user impact. >>> >>> SRE ServiceOps will be checking that all services can be safely deployed >>> before the maintenance, and will be redeploying all services before marking >>> the cluster as usable. Deployers are not required to re-deploy their >>> services, unless they have been informed to do so by SRE ServiceOps. >>> >>> During last week’s switchover >>> <https://phabricator.wikimedia.org/T399891>, Toolhub remained in eqiad. >>> This means that there will be an expected unavoidable small downtime of a >>> few hours. To minimize Toolhub’s downtime, we will prioritize its >>> redeployment during the initialization phase. >>> >>> >>> >>> Thank you for your understanding and support! If you have any questions >>> regarding this process, please respond to this email, comment on >>> Phabricator at ⚓T405703 Update wikikube eqiad to kubernetes 1.31 >>> <https://phabricator.wikimedia.org/T405703>, or reach out directly to >>> me (IRC nickname claime on #wikimedia-serviceops >>> <https://web.libera.chat/gamja/?nick=Guest#wikimedia-serviceops>). >>> >>> On behalf of SRE ServiceOps, >>> >>> -- >>> Clément 'claime' Goubert (they/them) >>> Senior SRE >>> Wikimedia Foundation >>> >> >> >> -- >> Clément 'claime' Goubert (they/them) >> Senior SRE >> Wikimedia Foundation >> > > > -- > Clément 'claime' Goubert (they/them) > Senior SRE > Wikimedia Foundation > -- Clément 'claime' Goubert (they/them) Senior SRE Wikimedia Foundation
_______________________________________________ Wikitech-l mailing list -- [email protected] To unsubscribe send an email to [email protected] https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
