[Wikitech-l] Re: March 2024 Switchover completed successfully
A minor correction, we will be switching to Dallas on Wednesday, September 25th. On Wed, Mar 20, 2024 at 5:23 PM Alexandros Kosiaris wrote: > Hello everyone, > > Please join us in celebrating a very successful Datacenter Switchover. > This switch to our data center in Virginia was run by Effie Mouzeli. > Despite some minor hiccup on Effie's network connection (a similar thing > happened to Clément a year ago, this is starting to become a pattern) it > was completed without a hitch. > > For context, the Site Reliability Team (SRE) runs a planned data center > switchover periodically, moving all wikis from our primary data center in > (for this instance, Texas) to the secondary data center (for this instance, > Virginia). This is an important periodic test of our tools and procedures, > to ensure the wikis will continue to be available even in the event of > major technical issues. It also gives all our SRE and ops teams a chance to > do maintenance and upgrades on systems that normally run 24 hours a day. > > The switchover process requires a brief read-only period for all > Foundation-hosted wikis, which started at 14:00 UTC on Wednesday March > 20th, and lasted 3 minutes and 8 seconds. All our public and private wikis > continued to be available for reading as usual. Users saw a notification of > the upcoming maintenance, and anyone still editing was asked to try again > in a few minutes. > > As with the previous Switchover, I 've been trying to discern the effect > of the Switchover in many of the graphs we have to monitor the > infrastructure in https://grafana.wikimedia.org. In many, it's impossible > to tell the event. We consider this very nice and attribute it to various > improvements done throughout the years from many teams, in and outside SRE. > The most discernible graph we have is of the edit rate. > > This switchover is our first where we are predominantly on MediaWiki on > Kubernetes, setting a very nice milestone for the project. > > As per our newer process, we no longer have a Switchback. We will be > staying in Virginia as our primary data center for the next 6 months, > switching back to Virginia on Wednesday, September 25th. > > As always, my deepest thanks to all people that have helped with this, in > one way or another, ranging from the person running point, to all SREs and > developers/deployers participating or having contributed, to people in > Movement Communications for helping with the messaging. > > To report any issues, you can reach us in #wikimedia-sre on IRC, or file a > Phabricator ticket with the datacenter-switchover tag (pre-filled form > here); we'll be monitoring closely for reports of trouble during and after > the switchover. (If you're new to Phab, there's more information at > Phabricator/Help.) The switchover, preparation as well as followup actions > are tracked in Phabricator Task T357547 > > -- > Alexandros Kosiaris > Principal Site Reliability Engineer > Wikimedia Foundation > -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
[Wikitech-l] March 2024 Switchover completed successfully
Hello everyone, Please join us in celebrating a very successful Datacenter Switchover. This switch to our data center in Virginia was run by Effie Mouzeli. Despite some minor hiccup on Effie's network connection (a similar thing happened to Clément a year ago, this is starting to become a pattern) it was completed without a hitch. For context, the Site Reliability Team (SRE) runs a planned data center switchover periodically, moving all wikis from our primary data center in (for this instance, Texas) to the secondary data center (for this instance, Virginia). This is an important periodic test of our tools and procedures, to ensure the wikis will continue to be available even in the event of major technical issues. It also gives all our SRE and ops teams a chance to do maintenance and upgrades on systems that normally run 24 hours a day. The switchover process requires a brief read-only period for all Foundation-hosted wikis, which started at 14:00 UTC on Wednesday March 20th, and lasted 3 minutes and 8 seconds. All our public and private wikis continued to be available for reading as usual. Users saw a notification of the upcoming maintenance, and anyone still editing was asked to try again in a few minutes. As with the previous Switchover, I 've been trying to discern the effect of the Switchover in many of the graphs we have to monitor the infrastructure in https://grafana.wikimedia.org. In many, it's impossible to tell the event. We consider this very nice and attribute it to various improvements done throughout the years from many teams, in and outside SRE. The most discernible graph we have is of the edit rate. This switchover is our first where we are predominantly on MediaWiki on Kubernetes, setting a very nice milestone for the project. As per our newer process, we no longer have a Switchback. We will be staying in Virginia as our primary data center for the next 6 months, switching back to Virginia on Wednesday, September 25th. As always, my deepest thanks to all people that have helped with this, in one way or another, ranging from the person running point, to all SREs and developers/deployers participating or having contributed, to people in Movement Communications for helping with the messaging. To report any issues, you can reach us in #wikimedia-sre on IRC, or file a Phabricator ticket with the datacenter-switchover tag (pre-filled form here); we'll be monitoring closely for reports of trouble during and after the switchover. (If you're new to Phab, there's more information at Phabricator/Help.) The switchover, preparation as well as followup actions are tracked in Phabricator Task T357547 -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
[Wikitech-l] Re: User style to reduce Gerritbot comments on Phabricator
I 've just installed it too and this is indeed really nice. Thanks for this! On Thu, Nov 30, 2023 at 2:37 AM Bartosz Dziewoński wrote: > On 2023-11-29 23:20, Zoran Dori wrote: > > Wow, thank you Bartosz, it looks AMAZING! > > > > How we can install it? > > Thanks :) > > You'll need to install a browser extension (add-on) that allows adding > user styles, and then add this style to it. > > There are button on the page I linked that should somewhat guide you > through it. The simplest way is to click "Get Stylus", follow the links > there to your browser's add-ons site, and install it; then come back to > that page, click "Install" and then confirm it. > > > -- > Bartosz Dziewoński > ___ > Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org > To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org > https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
[Wikitech-l] Datacenter Switchover process change
Hello, I 'd like to inform everyone of what we consider a big change (for the better) in the Datacenter Switchover process. The full rationale, planning and implementation is documented at https://wikitech.wikimedia.org/wiki/Switch_Datacenter/Recurring,_Equinox-based,_Data_Center_Switchovers and it includes a TL;DR that I am pasting below for everyone's convenience: Site Reliability Engineering will, starting September 2023, run a data center Switchover every 6 months, in the week of the solar Equinox <https://en.wikipedia.org/wiki/Equinox>, namely the *work weeks containing March 21st and September 21st*. If you are interested to learn more about Switchovers and why we perform them, or already know what they are and want to learn more about how this proposal would impact your workflows or the Wikimedia Movement, please read on. We hope that making the Switchover dates and duration predictable will allow the teams involved and/or utilizing a Switchover, as well as the entire movement reap the benefits we anticipate and document in the doc linked above. Regards, -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
[Wikitech-l] Re: eqiad wikikube kubernetes cluster upgrade on 2023-03-07
Hello everyone, Upgrade, done. Cluster has been successfully upgraded to 1.23 and applications have just been redeployed. toolhub is operational again. On Fri, Mar 3, 2023 at 3:45 PM Alexandros Kosiaris wrote: > Hello everyone, > > TL;DR Toolhub will have a few hours of downtime due to maintenance on > Tuesday 2023-03-07 Furthermore, if you are not deploying services to the > eqiad wikikube kubernetes > cluster, you can safely skip the rest. > > Long version: > > We will reinitialize the eqiad wikikube kubernetes cluster using > kubernetes > version 1.23 on 2023-03-07 09:00-16:00 UTC [1] (the actual process is > expected > take a couple of hours within this window). > The date was chosen for convenience as due to the data center switchover > process, eqiad is fully depooled, receiving almost 0 traffic. This is > scheduled to change on 2023-03-08, making the process more difficult. As > all traffic > has been drained already and we expect no visible impact. However, for the > duration of the process, the kubernetes cluster will be unavailable to > deployers and thus efforts to deploy to it will fail or worse, not have the > expected outcomes. > This is normal until SRE serviceops announces that the cluster is fully > operational again. > > SRE serviceops will be deploying all services before marking the cluster as > usable so there will be no need for deployers to > re-deploy their services (apart from those already informed). > > Toolhub, per https://phabricator.wikimedia.org/T329319 wasn't switched > over to codfw and is still being served from wikikube eqiad. Unavoidably, > it will suffer a small downtime of a few hours. That is known and expected. > To minimize that downtime, it will be prioritized during the initialization > phase. > > [1] https://phabricator.wikimedia.org/T331126 > > -- > Alexandros Kosiaris > Principal Site Reliability Engineer > Wikimedia Foundation > -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
[Wikitech-l] eqiad wikikube kubernetes cluster upgrade on 2023-03-07
Hello everyone, TL;DR Toolhub will have a few hours of downtime due to maintenance on Tuesday 2023-03-07 Furthermore, if you are not deploying services to the eqiad wikikube kubernetes cluster, you can safely skip the rest. Long version: We will reinitialize the eqiad wikikube kubernetes cluster using kubernetes version 1.23 on 2023-03-07 09:00-16:00 UTC [1] (the actual process is expected take a couple of hours within this window). The date was chosen for convenience as due to the data center switchover process, eqiad is fully depooled, receiving almost 0 traffic. This is scheduled to change on 2023-03-08, making the process more difficult. As all traffic has been drained already and we expect no visible impact. However, for the duration of the process, the kubernetes cluster will be unavailable to deployers and thus efforts to deploy to it will fail or worse, not have the expected outcomes. This is normal until SRE serviceops announces that the cluster is fully operational again. SRE serviceops will be deploying all services before marking the cluster as usable so there will be no need for deployers to re-deploy their services (apart from those already informed). Toolhub, per https://phabricator.wikimedia.org/T329319 wasn't switched over to codfw and is still being served from wikikube eqiad. Unavoidably, it will suffer a small downtime of a few hours. That is known and expected. To minimize that downtime, it will be prioritized during the initialization phase. [1] https://phabricator.wikimedia.org/T331126 -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
[Wikitech-l] Re: [Ops] codfw wikikube kubernetes cluster upgrade on 2023-02-21
Hello everyone, The cluster was successfully re-initialized today, all services have been re-pooled and are in service. The cluster is fully operational again and can be used by deployers. Regards, On Wed, Feb 15, 2023 at 1:41 PM Janis Meybohm wrote: > Hello everyone, > > TL;DR if you are not deploying services to the codfw wikikube kubernetes > cluster, you can safely skip this. > > Long version: > > We will reinitialize the codfw wikikube kubernetes cluster with kubernetes > version 1.23 on 2023-02-21 09:00-16:00 UTC [1] (the actual process is > expected > take a couple of hours within this window). > The date was chosen for convenience as we will have depooled all > active/active > services from codfw for row B switch maintenance [2] anyways. As all > traffic > will be drained beforehand we expect no user visible impact. However, for > the > duration of the process, the kubernetes cluster will be unavailable to > deployers and thus efforts to deploy to it will fail or worse, not have > the > expected outcomes. > This is normal until SRE serviceops announces that the cluster is fully > operational again. > > SRE serviceops will be deploying all services before marking the cluster > as > usable and pooling traffic back to it, so there will be no need for > deployers to > re-deploy their services (apart from those already informed). > > [1] https://phabricator.wikimedia.org/T329664 > [2] https://phabricator.wikimedia.org/T327991 > > Regars, > Janis Meybohm > > ___ > Ops mailing list -- o...@lists.wikimedia.org > To unsubscribe send an email to ops-le...@lists.wikimedia.org > -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
Re: [Wikitech-l] eqiad kubernetes cluster upgrade on 2021-03-23
Hello everyone, This has happened. The cluster has been reinitialized and upgraded and all services have been redeployed by SRE Service Operations. So, the cluster is fully operational again, feel free to deploy. Traffic hasn't been switched yet back as we are still making sure that it's also fully traffic capable as well, but it's expected to happen at the latest tomorrow. On Tue, Mar 23, 2021 at 10:23 AM Alexandros Kosiaris wrote: > > Hello everyone, > > This is starting now. Keep in mind that if you try to deploy to eqiad > k8s today, it WILL fail or just won't do what you expect it to do. > > On Fri, Mar 19, 2021 at 10:02 PM Alexandros Kosiaris > wrote: > > > > Hello everyone, > > > > TL;DR if you are not deploying services to the eqiad kubernetes > > cluster, you can safely skip this. > > > > Long version: > > > > After having tested thrice our cluster reinitialization procedure, next > > week, on Tuesday 2021-03-23 we will be reinitializing our eqiad > > kubernetes cluster. All > > traffic will be drained from it beforehand and we expect no user > > visible impact. However, for the duration of the process, the > > kubernetes eqiad cluster will be unavailable to deployers and thus > > efforts to deploy to it will fail or worse, not have the expected > > outcomes. This is normal until SRE serviceops announces that the > > cluster is fully operational again. > > > > SRE service-ops will be deploying all services before marking the > > cluster as usable and pooling traffic back to it, so there will be no > > need for deployers to re-deploy their services. > > > > For your convenience the list of services that are currently deployed > > on that cluster is: apertium api-gateway blubberoid changeprop > > changeprop-jobqueue citoid cxserver echostore eventgate-analytics > > eventgate-analytics-external eventgate-logging-external eventgate-main > > eventstreams eventstreams-internal linkrecommendation mathoid > > mobileapps proton push-notifications recommendation-api sessionstore > > similar-users termbox wikifeeds zotero > > > > Regards, > > > > -- > > Alexandros Kosiaris > > Principal Site Reliability Engineer > > Wikimedia Foundation > > > > -- > Alexandros Kosiaris > Principal Site Reliability Engineer > Wikimedia Foundation -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] eqiad kubernetes cluster upgrade on 2021-03-23
Hello everyone, This is starting now. Keep in mind that if you try to deploy to eqiad k8s today, it WILL fail or just won't do what you expect it to do. On Fri, Mar 19, 2021 at 10:02 PM Alexandros Kosiaris wrote: > > Hello everyone, > > TL;DR if you are not deploying services to the eqiad kubernetes > cluster, you can safely skip this. > > Long version: > > After having tested thrice our cluster reinitialization procedure, next > week, on Tuesday 2021-03-23 we will be reinitializing our eqiad > kubernetes cluster. All > traffic will be drained from it beforehand and we expect no user > visible impact. However, for the duration of the process, the > kubernetes eqiad cluster will be unavailable to deployers and thus > efforts to deploy to it will fail or worse, not have the expected > outcomes. This is normal until SRE serviceops announces that the > cluster is fully operational again. > > SRE service-ops will be deploying all services before marking the > cluster as usable and pooling traffic back to it, so there will be no > need for deployers to re-deploy their services. > > For your convenience the list of services that are currently deployed > on that cluster is: apertium api-gateway blubberoid changeprop > changeprop-jobqueue citoid cxserver echostore eventgate-analytics > eventgate-analytics-external eventgate-logging-external eventgate-main > eventstreams eventstreams-internal linkrecommendation mathoid > mobileapps proton push-notifications recommendation-api sessionstore > similar-users termbox wikifeeds zotero > > Regards, > > -- > Alexandros Kosiaris > Principal Site Reliability Engineer > Wikimedia Foundation -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] eqiad kubernetes cluster upgrade on 2021-03-23
Hello everyone, TL;DR if you are not deploying services to the eqiad kubernetes cluster, you can safely skip this. Long version: After having tested thrice our cluster reinitialization procedure, next week, on Tuesday 2021-03-23 we will be reinitializing our eqiad kubernetes cluster. All traffic will be drained from it beforehand and we expect no user visible impact. However, for the duration of the process, the kubernetes eqiad cluster will be unavailable to deployers and thus efforts to deploy to it will fail or worse, not have the expected outcomes. This is normal until SRE serviceops announces that the cluster is fully operational again. SRE service-ops will be deploying all services before marking the cluster as usable and pooling traffic back to it, so there will be no need for deployers to re-deploy their services. For your convenience the list of services that are currently deployed on that cluster is: apertium api-gateway blubberoid changeprop changeprop-jobqueue citoid cxserver echostore eventgate-analytics eventgate-analytics-external eventgate-logging-external eventgate-main eventstreams eventstreams-internal linkrecommendation mathoid mobileapps proton push-notifications recommendation-api sessionstore similar-users termbox wikifeeds zotero Regards, -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] codfw kubernetes cluster upgrade this week
Hi, This has happened now. Out of an abundance of caution, the cluster isn't going to be repooled right now, but rather tomorrow EU morning, but it's otherwise fully operational. Deploys will be fully functional again, so if anything breaks, please let us know in phabricator. Related task is: https://phabricator.wikimedia.org/T277191 is you care to follow up the last few steps. On Tue, Mar 16, 2021 at 10:31 AM Alexandros Kosiaris wrote: > > Hello everyone, > > TL;DR if you are not deploying services to the codfw kubernetes > cluster, you can safely skip this. > > Long version: > > After having tested twice our cluster reinitialization procedure, this > week we will be reinitializing our codfw kubernetes cluster. All > traffic will be drained from it beforehand and we expect no user > visible impact. However, for the duration of the process, the > kubernetes codfw cluster will be unavailable to deployers and thus > efforts to deploy to it will fail or worse, not have the expected > outcomes. This is normal until SRE serviceops announces that the > cluster is fully operational again. > > SRE service-ops will be deploying all services before marking the > cluster as usable and pooling traffic back to it, so there will be no > need for deployers to re-deploy their services. > > For your convenience the list of services that are currently deployed > on that cluster is: apertium api-gateway blubberoid changeprop > changeprop-jobqueue citoid cxserver echostore eventgate-analytics > eventgate-analytics-external eventgate-logging-external eventgate-main > eventstreams eventstreams-internal linkrecommendation mathoid > mobileapps proton push-notifications recommendation-api sessionstore > similar-users termbox wikifeeds zotero > > Regards, > > -- > Alexandros Kosiaris > Principal Site Reliability Engineer > Wikimedia Foundation -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] codfw kubernetes cluster upgrade this week
Hello everyone, TL;DR if you are not deploying services to the codfw kubernetes cluster, you can safely skip this. Long version: After having tested twice our cluster reinitialization procedure, this week we will be reinitializing our codfw kubernetes cluster. All traffic will be drained from it beforehand and we expect no user visible impact. However, for the duration of the process, the kubernetes codfw cluster will be unavailable to deployers and thus efforts to deploy to it will fail or worse, not have the expected outcomes. This is normal until SRE serviceops announces that the cluster is fully operational again. SRE service-ops will be deploying all services before marking the cluster as usable and pooling traffic back to it, so there will be no need for deployers to re-deploy their services. For your convenience the list of services that are currently deployed on that cluster is: apertium api-gateway blubberoid changeprop changeprop-jobqueue citoid cxserver echostore eventgate-analytics eventgate-analytics-external eventgate-logging-external eventgate-main eventstreams eventstreams-internal linkrecommendation mathoid mobileapps proton push-notifications recommendation-api sessionstore similar-users termbox wikifeeds zotero Regards, -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Fwd: Kubelet / Docker / dockershim
Forwarding this from upstream kubernetes mailing list. The TL;DR is that with the release that is due for September 2021 (assuming that happens as planned), Docker will no longer be usable as a Container Runtime Engine for vanilla Kubernetes. And that's it. All other usages of Docker remain unchanged. Given the support cycle of 12 months after a release is out, that gives us something less than 2 years for having evaluated the available replacements, settled on one, drafted and implemented a migration plan. It's a pretty early warning, which is nice. The above sounds more complicated than it will probably prove, for what is worth (although the devil is always in the details). As far as running services in our Wikimedia production kubernetes clusters goes, we never invested in Docker specific features/customizations on purpose, choosing to treat it as a replaceable part of the infrastructure, which should make this easier than initially thought. I 've created for tracking: https://phabricator.wikimedia.org/T269684 -- Forwarded message - Από: Davanum Srinivas Date: Κυρ, 6 Δεκ 2020, 05:53 Subject: Kubelet / Docker / dockershim To: Kubernetes developer/contributor discussion , Folks, If you haven't seen the discussions around $SUBJECT, please see [1] and [2]. Tl;dr Please evaluate and switch to CRI implementations that are or will be available in the community (like containerd, cri-o etc). For those who want to continue to use docker as their runtime, please see [3] and [4]. There will be changes to how you deploy/run your clusters as and when Mirantis/Docker folks come up with a migration plan for a separate (new!) external cri implementation. So watch that space. Issues, concerns, we can chat in sig-node slack channel or meetings (or drop a reply to this note). Thanks, Dims [1] https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/ [2] https://kubernetes.io/blog/2020/12/02/dockershim-faq/ [3] https://twitter.com/justincormack/status/1334976974083780609 [4] https://github.com/Mirantis/cri-dockerd -- Davanum Srinivas :: https://twitter.com/dims -- You received this message because you are subscribed to the Google Groups "Kubernetes developer/contributor discussion" group. To unsubscribe from this group and stop receiving emails from it, send an email to kubernetes-dev+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/kubernetes-dev/CANw6fcHRq%2BadjSkrt1dVQfFtcEc1sqtWRY7LktDFxKLt537Kkg%40mail.gmail.com. -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Fwd: OTRS major version upgrade on Monday September 14th 2020
FYI for wikitech-l as well. -- Forwarded message - From: Alexandros Kosiaris Date: Wed, Sep 16, 2020 at 11:16 AM Subject: Re: OTRS major version upgrade on Monday September 14th 2020 To: Private list for OTRS adminstrators , Hello everyone, This is to inform you that the upgrade of OTRS to 6.0.29 is now complete. The migration has gone well and thankfully has finished on time. The backlog of incoming emails seem to have arrived at the system as expected and new tickets have been created. I 've validated the various functionalities that I had planned to validate and everything seems to be in order (have a look in T187984 if you are interested in the technical details) https://ticket.wikimedia.org is now available again for normal use. As before, bugs that may have arisen due to the upgrade should be filed under the OTRS project in phabricator, preferably under the OTRS 6 column. Thank you for your patience, On Fri, Sep 4, 2020 at 2:20 PM Alexandros Kosiaris wrote: > > Hello everyone, > > This is to let you know that per > https://phabricator.wikimedia.org/T187984#6433997 on September 14th > 2020 a 48 hour maintenance window for OTRS from 5.0 to 6.0 will begin. > While this is an unusually large time window, the migration process > from 5.0 to 6.0 is rather slow. We have looked into optimizing it, but > at the end preferred to stick with the software developers > recommendations. > > During the maintenance window the system will be completely offline. That > means: > > * No access over the web to the interface > * No scheduled jobs of any kind will be run > * Email will not be delivered but rather backlogged. It will not be > lost as our MX systems will accept them and put them in the queue. > Once the system is back to being fully functional, the emails will > flow into the system. > > We will have a rollback plan ready of course, just in case the > migration goes awry. We already have tested it but something might > arise anyway. > > I am sorry for any inconvenience this might cause. > > -- > Alexandros Kosiaris > Principal Site Reliability Engineer > Wikimedia Foundation -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] OTRS major version upgrade on Monday September 14th 2020
I 've already sent the email below to otrs related mailing lists, but I think wikitech-l can also benefit from this, so here it is. -- Hello everyone, This is to let you know that per https://phabricator.wikimedia.org/T187984#6433997 on September 14th 2020 a 48 hour maintenance window for OTRS from 5.0 to 6.0 will begin. While this is an unusually large time window, the migration process from 5.0 to 6.0 is rather slow. We have looked into optimizing it, but at the end preferred to stick with the software developers recommendations. During the maintenance window the system will be completely offline. That means: * No access over the web to the interface * No scheduled jobs of any kind will be run * Email will not be delivered but rather backlogged. It will not be lost as our MX systems will accept them and put them in the queue. Once the system is back to being fully functional, the emails will flow into the system. We will have a rollback plan ready of course, just in case the migration goes awry. We already have tested it but something might arise anyway. I am sorry for any inconvenience this might cause. -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Debian Jessie base Wikimedia container image being phased out
Hello everyone, In the interest of proceeding with the deprecation of Jessie in our infrastructure, SRE ServiceOps will stop maintaining and eventually remove the base Debian Jessie OCI container image[1], also known as wikimedia-jessie, from our docker registry. If you rely on this image in any way, you are strongly urged to move to our Debian Stretch or Buster images, which will continue to be maintained for some time (for more information see https://wikitech.wikimedia.org/wiki/Operating_system_upgrade_policy). A tentative timeline follows: * 2020-07-02. https://gerrit.wikimedia.org/r/c/operations/puppet/+/587529 will be merged. This will mean the image will no longer be receiving any kind of updates but will still be possible to be pulled from the registry. Workflows are not expected to break. * 2020-08-03. Removal of all the image tags and versions of the above image will happen. That will mean that the image will no longer be able to be pulled from the registry. If you have any workflows that rely on pulling this image from the registry, they are expected to break on this date. Images that have already been published and are based on the Jessie image however will not be touched and will still be pullable. If you have a workflow that will break after the 2020-08-03 mark and are unable to alter it, please reach out to us. Following the above, we will also investigate whether removing image versions and tags that depend on the removed image makes sense. Note that this is the first time we remove a base image from our registry and as such we are building knowledge and experience around this. [1] https://tools.wmflabs.org/dockerregistry/wikimedia-jessie/tags/ Regards, -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Scrum of scrums/2019-11-27
I think a correction is due. * RelEng blocking Product Infrastructure: to create node10/buster images for Proton service migration [[phab:T237911]] This is actually SRE blocking Product Infrastructure, as SRE owns those images. On Wed, Nov 27, 2019 at 7:03 PM Željko Filipin wrote: > > Hi, > > for HTML version see > https://www.mediawiki.org/wiki/Scrum_of_scrums/2019-11-27 > > Željko > > -- > > = 2019-11-27 = > > == Callouts == > * Release Engineering - unusual train schedule: > *** This week: 1.35.0-wmf.8 - group0 only because of Thanksgiving > *** Next week: 1.35.0-wmf.8 - group1 + group2 > * RelEng blocking Product Infrastructure: to create node10/buster images > for Proton service migration [[phab:T237911]] > * Biggest fundraising campaign of the year hits enwiki (and in Canada also > frwiki) on Dec 2. Let's keep CentralNotice stable! > > == Product == > > === Android native app === > * Updates: > ** Suggested Edits v3 features and fundraising banner updates are now live > in production. > ** Working on mobile-html integration: [[phab:project/view/4318]] > > === Product Infrastructure === > * Blocked by: > ** RelEng: to create node10/buster images for Proton service migration > [[phab:T237911]] > * Updates: > ** Maps: > *** Investigating new OSM replication engine [[phab:T238554]] > ** Proton: > *** Moving Proton to debian buster, blocked on RelEng for node10/buster > images [[phab:T237911]] > > === Structured Data === > * Blocking: > ** Search Platform: Data dumps for SDC: [[phab:T221917]] > * Updates: > ** finishing off computer-aided tagging > ** finishing off Lua support > ** finishing off blockers for structured data in dumps > ** adding new input types > > === Inuka === > * Updates: > ** KaiOS app settings menu [[phab:T236265]] [[phab:T236312]] > [[phab:T236314]] > > == Technology == > > === Fundraising Tech === > * Updates: > ** Battening down the hatches for the December fundraiser > ** CiviCRM > *** Fixing duplicate mailing event records imported from bulk mail house > *** Investigating weird output from audit file parser for backup credit > card processor > *** Adding UI buttons to send end-of-year rollup emails on demand > ** CentralNotice > *** Reviewing and finishing up sub-national geotargeting > *** Still comparing stats between new and old data pipelines > ** Paymentswiki > *** small tweaks to CSS and fraud filters > > === Core Platform === > * Updates: > ** Page history REST endpoints out on train > ** "Endgame" API gateway planning > ** MCR Schema conversion > > === Engineering Productivity === > > Release Engineering > * Blocking: > ** Product Infrastructure: to create node10/buster images for Proton > service migration [[phab:T237911]] > * Updates: > ** Train Health > *** Last week: no train because of team offsite > *** This week: 1.35.0-wmf.8 - [[phab:T233856]] - group0 only because of > Thanksgiving > *** Next week: 1.35.0-wmf.8 - [[phab:T233856]] - group1 + group2 > > === Search Platform === > * Blocked by: > ** Structured Data: Data dumps for SDC: [[phab:T221917]] > * Updates: > ** Log Wikidata Query Service queries (sparql) to the event gate > infrastructure [[phab:T101013]] (soon to be deployed) > ** Increase logging sampling rates for search metrics from 12.5% to 100% > (8x) [[phab:T197129]] > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Alexandros Kosiaris Principal Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Scrum of scrums/2019-06-19
> Also, is Site Reliability Engineering is blocking WMDE or not? > > === Site Reliability Engineering === > * Blocking: > ** WMDE has been unblocked on termbox No we (SRE) are not blocking (anymore) WMDE, mistake on my part in the notes, should have been filed under updates. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Small change to wmf apache log format
In light of the gerrit incidents these last few days, and as part of the process of strengthening gerrit's operational security, we 've just gone ahead and configured gerrit to add the User: HTTP header on the response. To take advantage of that, we 've also amended the wmf apache LogFormat directive to log that header if it exists. I 've documented the change in https://wikitech.wikimedia.org/wiki/Apache_log_format. Note that the order of fields changes just a bit (the last field is now 17 instead of 16, the 16th is now the User: HTTP header if it exists, otherwise a -). If you are aware of anything that might break because of that let us know. Regards, -- Alexandros Kosiaris Senior Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Gerrit outage
Hello Fæ, While I understand and agree with your point, I must point out that this 4 days have been hectic on many people from multiple teams. The amount of work to cleanup one person's destructive half hour spree is staggering. We need better tooling for sure to combat this, something that while MediaWiki is already equipped with, some of the infrastructure tools are not (yet hopefully). It saddens me greatly to say that, but we might have to take some steps in the opposite direction, for a while at least, until we are in shape to combat this more effectively. Regards, On Tue, Mar 19, 2019 at 3:40 PM Fæ wrote: > > Thanks to everyone who helped sort this out. > > In some ways, the vandalism neatly demonstrates how Wikimedia projects > rely on trust. When these things happen, it is a nice reminder that > our open values mean that we should take a light approach to security > whenever the potential exposure is always going to be recoverable. > Resilience rather than impenetrable, for our community at least, is a > healthy way to prioritize. The occasional predictable idiot is no > reason to change that approach. > > Cheers, > Fae > > On Tue, 19 Mar 2019 at 13:28, Alexandros Kosiaris > wrote: > > > > Gerrit is back up. Almost all of the vandalism has been cleaned up, > > some minor stuff remains, we will clean that up as well. > > > > On Tue, Mar 19, 2019 at 1:42 PM planetenxin wrote: > > > > > > Am 19.03.2019 um 12:21 schrieb Andre Klapper: > > > > planetenxin: Sorry for my previous message, was not meant to be rude. > > > > > > no worries. Hope, that Gerrit is back alive soon. :-) > > > > > > ___ > > > Wikitech-l mailing list > > > Wikitech-l@lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > -- > fae...@gmail.com https://commons.wikimedia.org/wiki/User:Fae > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Alexandros Kosiaris Senior Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Gerrit outage
Gerrit is back up. Almost all of the vandalism has been cleaned up, some minor stuff remains, we will clean that up as well. On Tue, Mar 19, 2019 at 1:42 PM planetenxin wrote: > > Am 19.03.2019 um 12:21 schrieb Andre Klapper: > > planetenxin: Sorry for my previous message, was not meant to be rude. > > no worries. Hope, that Gerrit is back alive soon. :-) > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Alexandros Kosiaris Senior Site Reliability Engineer Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Code Stewardship Reviews candidates open for comment
Many many thanks! On Mon, Dec 17, 2018 at 7:21 AM Jean-Rene Branaa wrote: > > Hello Alexandros, > > Makes sense. We'll extend the current review cycle until January 16th, > 2019 in order to allow more time due to the coming holidays. > > Cheers, > > JR > > > > On Sun, Dec 16, 2018 at 00:30 Alexandros Kosiaris > wrote: > > > Hi, > > > > Any chance the Dec 28th deadline be altered? It's end of the quarter and > > with the goal wrapping and PTOs, holidays etc I find it a bit difficult to > > collect all the necessary data to help make an informed decision. > > > > Στις Πέμ, 13 Δεκ 2018, 21:00 ο χρήστης Jean-Rene Branaa < > > jbra...@wikimedia.org> έγραψε: > > > > > Hello All, > > > > > > We've opened the feedback cycle for the current Code Stewardship > > > Review candidates. They include: > > > > > > CodeReview extension[0] > > > UserMerge extension[1] > > > Graphoid service[2] > > > > > > Feedback can be provided via the talk pages for each of the items > > > under review and/or their associated Phabricator tasks. > > > > > > The Code Stewardship review process[3] is intended to help address > > > code deployed to production that is un/under funded. The outcome of > > > this process is generally one of three: re-investment, no change, or > > > sunset (ramp down all investment and remove from production > > > environment). > > > > > > Please provide your feedback before December 28th. > > > > > > Cheers, > > > > > > JR > > > IRC: jrbranaa > > > Release Engineering/Code Health > > > > > > > > > [0] > > > > > https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/CodeReview > > > [1] > > > > > https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/UserMerge > > > [2] > > > > > https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/Graphoid > > > [3]https://www.mediawiki.org/wiki/Code_stewardship_reviews > > > > > > ___ > > > Wikitech-l mailing list > > > Wikitech-l@lists.wikimedia.org > > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > > ___ > > Wikitech-l mailing list > > Wikitech-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Alexandros Kosiaris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Code Stewardship Reviews candidates open for comment
Hi, Any chance the Dec 28th deadline be altered? It's end of the quarter and with the goal wrapping and PTOs, holidays etc I find it a bit difficult to collect all the necessary data to help make an informed decision. Στις Πέμ, 13 Δεκ 2018, 21:00 ο χρήστης Jean-Rene Branaa < jbra...@wikimedia.org> έγραψε: > Hello All, > > We've opened the feedback cycle for the current Code Stewardship > Review candidates. They include: > > CodeReview extension[0] > UserMerge extension[1] > Graphoid service[2] > > Feedback can be provided via the talk pages for each of the items > under review and/or their associated Phabricator tasks. > > The Code Stewardship review process[3] is intended to help address > code deployed to production that is un/under funded. The outcome of > this process is generally one of three: re-investment, no change, or > sunset (ramp down all investment and remove from production > environment). > > Please provide your feedback before December 28th. > > Cheers, > > JR > IRC: jrbranaa > Release Engineering/Code Health > > > [0] > https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/CodeReview > [1] > https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/UserMerge > [2] > https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/Graphoid > [3]https://www.mediawiki.org/wiki/Code_stewardship_reviews > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Datacenter Switchback recap
A minor correction: > During the most critical part of the switch > today, the wikis were in read-only mode for a duration of 4 minutes > and 41 seconds. This was yesterday, not today. -- Alexandros Kosiaris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Datacenter Switchback recap
Hello everyone, Today we've concluded the successful migration of our wikis (MediaWiki and associated services) from our secondary datacenter (codfw) back to the primary one (eqiad). During the most critical part of the switch today, the wikis were in read-only mode for a duration of 4 minutes and 41 seconds. That's a significant improvement over the 7 mins and 34 seconds we achieved during the inverse process we concluded a month ago, which was already significantly better than last year. I 'd like to believe that it's the result of the increasing amount of experience we are building and trust we are putting in the process and tools that we have developed for this. Although the switchback process itself has been largely automated and went pretty smoothly, there have been some issues that we experienced: - CentralNotice banners stayed online for a longer time than necessary due to miscommunication issues. This has now been documented and will be avoided in the future. - After the switchback we 've experienced increased load to all our mediawiki application servers. The root cause has been identified and mitigation against it will be put in place. The summary is non working replication of parsercache between the 2 datacenters. - Last, but not least and probably the most important of all issues, a data inconsistency was detected in wikidata (s8). Namely some articles that were present in codfw but were not replicated in eqiad. We are still investigating the root cause of this while applying corrective actions to mitigate the user impact as quickly as possible. All wikis are now served from our primary data center again. Should you experience any issue that is deemed related to the switchover process, please feel free to file a ticket in Phabricator and tag it with the Datacenter-Switchover-2018 project tag[1]. We will monitor this tag closely and keep any and all issues updated. We'd like to thank everyone for their hard work in ensuring any (potential) issues got resolved timely, for automating the process whenever and wherever possible, and for making this datacenter switchover and switchback a success! -- Alexandros Kosiaris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Datacenter switchover and switchback
I am sorry to hear that. It looks like something that we will have to take into account for the next switchovers. That being said, we had deliberations across the involved teams months ago to come up with those exact dates and have been communicating them via at least SoS since 2018-08-01. I am curious about something though. How does the deployment train (cause that's what we are talking about) impact the software release exactly ? On Wed, Oct 3, 2018 at 6:19 PM C. Scott Ananian wrote: > > Oct 8 seems to be a particularly bad time to freeze the train given that we > are forking for the MW 1.32 release on Oct 15, and a lot of folks have > last-minute things they want to get into the release (eg deprecations, etc). > --scott > > On Thu, Aug 30, 2018 at 10:57 AM Pine W wrote: > > > +1 to DJ's question about timing. Also, one might wish to be mindful of > > the number of recent trains that were supposed to be boring but involved > > interesting surprises; this makes me wonder whether trains that one thinks > > will be boring are actually OK in this circumstance even if they turn out > > to be "interesting". > > > > Pine > > ( https://meta.wikimedia.org/wiki/User:Pine ) > > > > > > > > Original message From: Derk-Jan Hartman < > > d.j.hartman+wmf...@gmail.com> Date: 8/30/18 2:54 AM (GMT-08:00) To: > > Wikimedia developers Subject: Re: > > [Wikitech-l] Datacenter switchover and switchback > > While I think these regular switches are a very good idea, from an outside > > perspective I do have to question a process that puts a significant plug in > > the velocity of various teams working on major projects (esp. in a time of > > year that could probably be seen as one of the most productive). What are > > plans to reduce the disruption of this exercise in the future ? > > > > DJ > > > > On Thu, Aug 30, 2018 at 8:38 AM Jaime Crespo > > wrote: > > > > > Let me explain the rationale of the bellow request for clarification: > > > > > > On Wed, Aug 29, 2018 at 11:30 PM MA wrote: > > > > > > > Hello: > > > > > > > > >For the duration of the switchover (1 month), deployers are kindly > > > > >requested to refrain from large db schema changes and avoid deploying > > > > >any kind of new feature that requires creation of tables. > > > > >There will be a train freeze in the week of Sept 10th and Oct 8th. > > > > > > > > > During the failover, some schema changes will be finalized on the current > > > active datacenter (plus some major server and network maintenance may be > > > done)- our request is mostly to refrain from quickly enabling those large > > > new unlocked features (e.g. the ongoing comment refactoring, actor/user > > > refactoring, Multi Content Revision, JADE, major wikidata or structured > > > comons structure changes, new extensions not ever deployed to the > > cluster, > > > etc.) at the same time than the ongoing maintenance to reduce variables > > of > > > things that can go bad- enabling those features may be unblocked during > > the > > > switchover time, but we ask you to hold until being back on the current > > > active datacenter. Basically, ask yourself if you are enabling a large > > new > > > core feature or want to start a heavy-write maintenance script and there > > is > > > a chance you will need DBA/system support. Sadly, we had some instances > > of > > > this happening last year and we want to explicitly discourage this during > > > these 2 weeks. > > > > > > In own my opinion, enabling existing features on smaller projects (size > > > here is in amount of server resources, not that they are less important) > > is > > > equivalent to a swat change, and I am not against it happening. I would > > ask > > > contributors to use their best judgement on every case, and ask people on > > > the #DBA tag on phabricator or add me as reviewers on gerrit if in doubt. > > > My plea is to not enable major structural changes during that time may > > > affect thousands of edits per minute. Swat-like changes and "boring" :-) > > > trains are ok. > > > > > > For new wiki creations I would prefer if those were delayed but CC #DBA s > > > on the phabricator task to check with us. > > > ___ > > > Wikitech-l mailing list > > > Wikitech-l@lists.wikimedia.org > > > https://lists.wikimedia.org/m
Re: [Wikitech-l] Datacenter switchover and switchback
Reminder: The switchback is next week. On Wed, Aug 29, 2018 at 8:28 PM Alexandros Kosiaris wrote: > > Hello everyone, > > This is to inform you that there will be a datacenter switchover and > switchback on the next few weeks. The timeline's are > > Services: Tuesday, September 11th 2018 14:30 UTC > Media storage/Swift: Tuesday, September 11th 2018 15:00 UTC > Traffic: Tuesday, September 11th 2018 19:00 UTC > MediaWiki: Wednesday, September 12th 2018: 14:00 UTC > > Switchback: > > Traffic: Wednesday, October 10th 2018 09:00 UTC > MediaWiki: Wednesday, October 10th 2018: 14:00 UTC > Services: Thursday, October 11th 2018 14:30 UTC > Media storage/Swift: Thursday, October 11th 2018 15:00 UTC > > For the duration of the switchover (1 month), deployers are kindly > requested to refrain from large db schema changes and avoid deploying > any kind of new feature that requires creation of tables. > There will be a train freeze in the week of Sept 10th and Oct 8th. > > The net effect of the switchover and switchback for volunteers is > expected to be some minutes of inability to save an edit. For readers, > everything will be as usual. > > The tracking task for interested parties is > https://phabricator.wikimedia.org/T199073 > > Regards, > > -- > Alexandros Kosiaris -- Alexandros Kosiaris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Datacenter Switchover recap
Hello all, Today we've successfully migrated our wikis (MediaWiki and associated services) from our primary data center (eqiad) to our secondary (codfw), an exercise we've done for the 3rd year in a row. During the most critical part of the switch today, the wikis were in read-only mode for a duration of 7 and a half minutes - a significant improvement from last year. Although the switchover process itself has been largely automated and went pretty smoothly once started, we did experience some issues leading up to our maintenance window, which caused us to delay the switch somewhat: - In the days before the switch a performance issue in the Translate extension for CentralNotice had been discovered, which was expected to cause database stampede issues during the switch, and we decided to mitigate this by temporarily disabling the extension for the duration of the switchover process. However it's now understood that this may have caused some unwanted side effects and should be avoided in the future in favor of other methods. - Right before the switchover commenced, an eqiad Varnish server misbehaved, causing a high spike of failed requests. Thankfully the SRE Traffic team identified and addressed the issue prompty, allowing the switchover to proceed. - Two codfw s7 database slaves crashed right before the start of our maintenance window. This delayed the start of our switchover procedure by approximately 30 minutes into our maintenance window as we were investigating cause and impact. - The ElasticSearch search cluster traffic did not follow MediaWiki traffic from eqiad to codfw during the switch as was expected, but stayed in our primary data center instead. Investigation showed that ElasticSearch had been manually hardcoded to eqiad in its configuration. This was rectified after the switchover was complete with a configuration change and manual switch to codfw. - After the switchover completed we experienced some repetitive database load spikes, primarily on the codfw s1 cluster (serving English Wikipedia). The DBA team performed a series of fine tuning and other corrective actions. All wikis are now served from our secondary codfw data center, and this is expected to stay that way for the next 4 weeks, when we will reverse this procedure. Should you experience any issue that is deemed related to the switchover process, please feel free to file a ticket in Phabricator and tag it with the Datacenter-Switchover-2018 project tag[1]. We will monitor this tag closely and keep any and all issues updated. We'd like to thank everyone for their hard work in ensuring any (potential) issues got resolved timely, for automating the process whenever and wherever possible, and for making this datacenter switch a success! [1] https://phabricator.wikimedia.org/project/profile/3571/ -- Alexandros Kosiaris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Datacenter switchover and switchback
On Thu, Aug 30, 2018 at 12:55 PM Derk-Jan Hartman wrote: > > While I think these regular switches are a very good idea, from an outside > perspective I do have to question a process that puts a significant plug in > the velocity of various teams working on major projects (esp. in a time of > year that could probably be seen as one of the most productive). That is absolutely true. > What are > plans to reduce the disruption of this exercise in the future ? There are indeed plans about making this easier, more automated and shorter in duration. Just to name a few, mediawiki now no longer requires deployments to switch datacenters but rather relies on a state key in the etcd database, there is a library + cookbooks to automate the currently automateable steps, there is work to update the documentation and make it update to date, accurate and more usable by more people and at shorter notices and so on. That being said, it's never gonna be free, but the toil is significant to make us want to reduce and that what we aim for. > > DJ > > On Thu, Aug 30, 2018 at 8:38 AM Jaime Crespo wrote: > > > Let me explain the rationale of the bellow request for clarification: > > > > On Wed, Aug 29, 2018 at 11:30 PM MA wrote: > > > > > Hello: > > > > > > >For the duration of the switchover (1 month), deployers are kindly > > > >requested to refrain from large db schema changes and avoid deploying > > > >any kind of new feature that requires creation of tables. > > > >There will be a train freeze in the week of Sept 10th and Oct 8th. > > > > > > During the failover, some schema changes will be finalized on the current > > active datacenter (plus some major server and network maintenance may be > > done)- our request is mostly to refrain from quickly enabling those large > > new unlocked features (e.g. the ongoing comment refactoring, actor/user > > refactoring, Multi Content Revision, JADE, major wikidata or structured > > comons structure changes, new extensions not ever deployed to the cluster, > > etc.) at the same time than the ongoing maintenance to reduce variables of > > things that can go bad- enabling those features may be unblocked during the > > switchover time, but we ask you to hold until being back on the current > > active datacenter. Basically, ask yourself if you are enabling a large new > > core feature or want to start a heavy-write maintenance script and there is > > a chance you will need DBA/system support. Sadly, we had some instances of > > this happening last year and we want to explicitly discourage this during > > these 2 weeks. > > > > In own my opinion, enabling existing features on smaller projects (size > > here is in amount of server resources, not that they are less important) is > > equivalent to a swat change, and I am not against it happening. I would ask > > contributors to use their best judgement on every case, and ask people on > > the #DBA tag on phabricator or add me as reviewers on gerrit if in doubt. > > My plea is to not enable major structural changes during that time may > > affect thousands of edits per minute. Swat-like changes and "boring" :-) > > trains are ok. > > > > For new wiki creations I would prefer if those were delayed but CC #DBA s > > on the phabricator task to check with us. > > _______ > > Wikitech-l mailing list > > Wikitech-l@lists.wikimedia.org > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Alexandros Kosiaris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Datacenter switchover and switchback
Hello everyone, This is to inform you that there will be a datacenter switchover and switchback on the next few weeks. The timeline's are Services: Tuesday, September 11th 2018 14:30 UTC Media storage/Swift: Tuesday, September 11th 2018 15:00 UTC Traffic: Tuesday, September 11th 2018 19:00 UTC MediaWiki: Wednesday, September 12th 2018: 14:00 UTC Switchback: Traffic: Wednesday, October 10th 2018 09:00 UTC MediaWiki: Wednesday, October 10th 2018: 14:00 UTC Services: Thursday, October 11th 2018 14:30 UTC Media storage/Swift: Thursday, October 11th 2018 15:00 UTC For the duration of the switchover (1 month), deployers are kindly requested to refrain from large db schema changes and avoid deploying any kind of new feature that requires creation of tables. There will be a train freeze in the week of Sept 10th and Oct 8th. The net effect of the switchover and switchback for volunteers is expected to be some minutes of inability to save an edit. For readers, everything will be as usual. The tracking task for interested parties is https://phabricator.wikimedia.org/T199073 Regards, -- Alexandros Kosiaris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] grafana.wikimedia.org migration to the native LDAP implementation on June 27th
Hello everyone, The migration is completed. Now grafana.wikimedia.org supports the "Sign-in" action (accessible from the top-left button) that allows to edit dashboard by providing LDAP credentials. grafana-admin.wikimedia.org is deprecated and no longer working as it used to and will be removed at an unspecified later point in time. I would urge everyone to update bookmarks, links and anything else using it to now use grafana.wikimedia.org Don't hesitate to reach out in https://phabricator.wikimedia.org/T170150 if you experience any problems. Regards, On Wed, Jun 27, 2018 at 12:06 PM Alexandros Kosiaris wrote: > > FYI, this is happening today. > On Wed, Jun 13, 2018 at 2:14 PM Alexandros Kosiaris > wrote: > > > > Hello everyone, > > > > If you don't edit/create dashboards on grafana.wikimedia.org feel free > > to skip the rest of this email, it does not affect you. Those of you > > who do, please read on. > > > > Our grafana installation dates back quite a bit and it has a custom > > LDAP authentication and authorization implementation. Nowadays Grafana > > has native support for LDAP and can query LDAP groups. After some > > lengthy evaluation and problem solving in > > https://phabricator.wikimedia.org/T170150, the SRE team feels ready to > > deprecate the custom LDAP implementation and migrate to the native > > one. What does this mean? > > > > * grafana-admin.wikimedia.org (the entry point for users to > > authenticate and create dashboards) will be removed > > * grafana.wikimedia.org WILL remain as is and will very only slightly > > change, specifically the login menu part (see below) > > * You will be logging in using a menu in the left hand side of > > grafana.wikimedia.org using the exact same credentials as before. > > * The grafana database user attributes email and name will be > > automatically populated on login from LDAP and enforced. This is > > expected to be irrelevant for almost everyone, but if you 've edited > > manually your profile in grafana expect those 2 fields to be overriden > > with data from LDAP. > > > > The date for the migration is June 27th 09:00 UTC and the maintenance > > window will be 3 hours long. We don't expect it to really last that > > long however. Expect grafana.wikimedia.org downtime during that > > timeframe > > > > Regards, > > > > -- > > Alexandros Kosiaris > > > > -- > Alexandros Kosiaris -- Alexandros Kosiaris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] grafana.wikimedia.org migration to the native LDAP implementation on June 27th
FYI, this is happening today. On Wed, Jun 13, 2018 at 2:14 PM Alexandros Kosiaris wrote: > > Hello everyone, > > If you don't edit/create dashboards on grafana.wikimedia.org feel free > to skip the rest of this email, it does not affect you. Those of you > who do, please read on. > > Our grafana installation dates back quite a bit and it has a custom > LDAP authentication and authorization implementation. Nowadays Grafana > has native support for LDAP and can query LDAP groups. After some > lengthy evaluation and problem solving in > https://phabricator.wikimedia.org/T170150, the SRE team feels ready to > deprecate the custom LDAP implementation and migrate to the native > one. What does this mean? > > * grafana-admin.wikimedia.org (the entry point for users to > authenticate and create dashboards) will be removed > * grafana.wikimedia.org WILL remain as is and will very only slightly > change, specifically the login menu part (see below) > * You will be logging in using a menu in the left hand side of > grafana.wikimedia.org using the exact same credentials as before. > * The grafana database user attributes email and name will be > automatically populated on login from LDAP and enforced. This is > expected to be irrelevant for almost everyone, but if you 've edited > manually your profile in grafana expect those 2 fields to be overriden > with data from LDAP. > > The date for the migration is June 27th 09:00 UTC and the maintenance > window will be 3 hours long. We don't expect it to really last that > long however. Expect grafana.wikimedia.org downtime during that > timeframe > > Regards, > > -- > Alexandros Kosiaris -- Alexandros Kosiaris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] grafana.wikimedia.org migration to the native LDAP implementation on June 27th
Hello everyone, If you don't edit/create dashboards on grafana.wikimedia.org feel free to skip the rest of this email, it does not affect you. Those of you who do, please read on. Our grafana installation dates back quite a bit and it has a custom LDAP authentication and authorization implementation. Nowadays Grafana has native support for LDAP and can query LDAP groups. After some lengthy evaluation and problem solving in https://phabricator.wikimedia.org/T170150, the SRE team feels ready to deprecate the custom LDAP implementation and migrate to the native one. What does this mean? * grafana-admin.wikimedia.org (the entry point for users to authenticate and create dashboards) will be removed * grafana.wikimedia.org WILL remain as is and will very only slightly change, specifically the login menu part (see below) * You will be logging in using a menu in the left hand side of grafana.wikimedia.org using the exact same credentials as before. * The grafana database user attributes email and name will be automatically populated on login from LDAP and enforced. This is expected to be irrelevant for almost everyone, but if you 've edited manually your profile in grafana expect those 2 fields to be overriden with data from LDAP. The date for the migration is June 27th 09:00 UTC and the maintenance window will be 3 hours long. We don't expect it to really last that long however. Expect grafana.wikimedia.org downtime during that timeframe Regards, -- Alexandros Kosiaris ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Wikimedia-l] Τι σας κάνει ευτυχισμένη αυτήν την εβδομάδα? / What's making you happy this week? (Week of 22 April 2018)
Aside from the actual content (it's nice to see the legal case in Greece ending), seeing the subject in Greek was one more reason I became happy. But I think a small correction is in place. A more appropriate way of saying "What's making you happy this week?" would be "Τι σας κάνει ευτυχείς αυτήν την εβδομάδα;", where "ευτυχείς" (an adjective) would be the plural form of happy, which is used both when addressing groups of people and when being polite. Alternatively "ευτυχισμένους", could be used, with exact same meaning, just using the participle form (and in the appropriate conjugation) instead of the adjective. "Xαρούμενους" (again a participle, just of a different verb) would also be valid with the same meaning for most people, although if one wants to be pedantic, "χαρά" is closer to "joy" than "happiness". Regards, -- Alexandros Kosiaris <akosia...@wikimedia.org> ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Reboot of irc.wikimedia.org
Hello everyone This has just happened. Most bots seem to have already reconnected successfully and so has the rc-pmtpa bot. Everything seems to be in order. Regards, On Thu, Jan 25, 2018 at 9:18 PM, Alexandros Kosiaris <akosia...@wikimedia.org> wrote: > Hi, > > On Sun, Jan 21, 2018 at 9:32 PM, MZMcBride <z...@mzmcbride.com> wrote: >> Alexandros Kosiaris wrote: >>>This is to inform you that on Monday Feb 22nd 2018 ~10:00 UTC, the >>>infrastructure powering irc.wikimedia.org will be rebooted for >>>security upgrades. This is expected to only impact bots that are using >>>irc.wikimedia.org AND are not able to automatically reconnect on >>>connection failure. From recent experience (the equipment was rebooted >>>210 ago last time, with no fallout) those are very limited in number >>>these days. >> >> Thank you for this notice. > > Sorry for not answering sooner, I 've just seen this. > >> >> Do you know if it's still the case that per-wiki channels on >> irc.wikimedia.org are not available/able to be joined until there's >> activity on that wiki? > > I did not know that, but I just researched it and that is still the > case. For future reference the code in question is in > https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mw_rc_irc/files/udpmxircecho.py;b2d13fbc838a7f994a2e97fdffc07f15fe88bc29$49 > > >> My memory is that reconnecting to irc.wikimedia.org >> has not usually been the issue (as you note), but instead it's been that >> there's a speaking bot account on the network that would only create a >> channel after there's some activity to report about that wiki. For larger >> wikis with lots of activity, this means the channels get created nearly >> instantly after a server restart. For smaller wikis with little activity, >> this means that the channels may not get re-created for days or even weeks. > > I have no such recollection but it does make sense. > >> >> I just tested irc.wikimedia.org again and it appears that joining/creating >> arbitrary channels is not allowed. This makes me think that bot accounts >> and others would be disallowed from joining small/quiet wiki channels >> until those channels are re-created by by the server/rc-pmtpa, unless some >> kind of whitelist or workaround has been implemented. > > That's true. But I don't expect this to cause any kind of major > problems and experience up to now supports that. But thanks for > bringing it up. I learned something today. > > -- > Alexandros Kosiaris <akosia...@wikimedia.org> -- Alexandros Kosiaris <akosia...@wikimedia.org> ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Reboot of irc.wikimedia.org
Hi, On Sun, Jan 21, 2018 at 9:32 PM, MZMcBride <z...@mzmcbride.com> wrote: > Alexandros Kosiaris wrote: >>This is to inform you that on Monday Feb 22nd 2018 ~10:00 UTC, the >>infrastructure powering irc.wikimedia.org will be rebooted for >>security upgrades. This is expected to only impact bots that are using >>irc.wikimedia.org AND are not able to automatically reconnect on >>connection failure. From recent experience (the equipment was rebooted >>210 ago last time, with no fallout) those are very limited in number >>these days. > > Thank you for this notice. Sorry for not answering sooner, I 've just seen this. > > Do you know if it's still the case that per-wiki channels on > irc.wikimedia.org are not available/able to be joined until there's > activity on that wiki? I did not know that, but I just researched it and that is still the case. For future reference the code in question is in https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mw_rc_irc/files/udpmxircecho.py;b2d13fbc838a7f994a2e97fdffc07f15fe88bc29$49 > My memory is that reconnecting to irc.wikimedia.org > has not usually been the issue (as you note), but instead it's been that > there's a speaking bot account on the network that would only create a > channel after there's some activity to report about that wiki. For larger > wikis with lots of activity, this means the channels get created nearly > instantly after a server restart. For smaller wikis with little activity, > this means that the channels may not get re-created for days or even weeks. I have no such recollection but it does make sense. > > I just tested irc.wikimedia.org again and it appears that joining/creating > arbitrary channels is not allowed. This makes me think that bot accounts > and others would be disallowed from joining small/quiet wiki channels > until those channels are re-created by by the server/rc-pmtpa, unless some > kind of whitelist or workaround has been implemented. That's true. But I don't expect this to cause any kind of major problems and experience up to now supports that. But thanks for bringing it up. I learned something today. -- Alexandros Kosiaris <akosia...@wikimedia.org> ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Reboot of irc.wikimedia.org
Hello everyone, This is to inform you that on Monday Feb 22nd 2018 ~10:00 UTC, the infrastructure powering irc.wikimedia.org will be rebooted for security upgrades. This is expected to only impact bots that are using irc.wikimedia.org AND are not able to automatically reconnect on connection failure. From recent experience (the equipment was rebooted 210 ago last time, with no fallout) those are very limited in number these days. Regards, -- Alexandros Kosiaris <akosia...@wikimedia.org> ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Setting up multiple Parsoid servers behind load balancer
Hi James, I don't know if you have noticed the following in C. Scott's response > At any rate: in your configurations you have URL and HTTPProxy set to the > exact same string. This is almost certainly not right. I believe if you > just omit the proxy lines entirely from the configuration you'll find > things work as you expect. > --scott but I could not help but notice the error too. AFAIK setting these variables instruct both software to use http://192.168.56.63:8001/ as a forward proxy which is NOT what you have there. HAproxy is a reverse proxy software, not a forward proxy (although you can abuse it to achieve that functionality). In the setup you describe there is no need for forward proxies so neither parsoid nor mediawiki need a proxy configuration. I also don't think you need RESTBase as long as you are willing to wait for parsoid to finish parsing and returning the result. It should be fine for small articles, but as these grow larger, you will start having various performance related problems (for example you might have to adjust haproxy timeouts). But from what I gather, you are not there yet. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Setting up a new Tomcat servlet in production?
On Thu, Oct 20, 2016 at 10:20 AM, 魔法設計師 <shoichi.c...@gmail.com> wrote: > 2016-10-19 0:45 GMT+08:00 Alexandros Kosiaris <akosia...@wikimedia.org>: >> >> Hello, >> >> With the preamble of my opinion not being an authoritative point of >> view at all, I should point out that Java/JVM based services are not >> especially loved in WMF. Ops does not feel it has the capability of >> supporting them. There are a few around like Gerrit, Cassandra, >> ElasticSearch, Kafka but none of these is actually maintained by ops. >> All of these have owners/maintainers outside of ops (entire teams in >> some cases), with varying degrees of success. The question of whether >> it should be Tomcat or Jetty, is a valid one, but serves to alleviate >> only part of the problem (it's not like Ops hate tomcat but like >> Jetty). So, there are probably a few social/administrative issues that >> it might make sense to address first before handling the technical >> part. > > I think what you means is : if the service is online,for administration , > a maintainer or a team is needed just like Gerrit,Cassandra, and > ElasticSearch did. Not maintained by the OPs. The team need to be set > first. Am I right? Yes, that's the very least IMHO. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Setting up a new Tomcat servlet in production?
Hello, With the preamble of my opinion not being an authoritative point of view at all, I should point out that Java/JVM based services are not especially loved in WMF. Ops does not feel it has the capability of supporting them. There are a few around like Gerrit, Cassandra, ElasticSearch, Kafka but none of these is actually maintained by ops. All of these have owners/maintainers outside of ops (entire teams in some cases), with varying degrees of success. The question of whether it should be Tomcat or Jetty, is a valid one, but serves to alleviate only part of the problem (it's not like Ops hate tomcat but like Jetty). So, there are probably a few social/administrative issues that it might make sense to address first before handling the technical part. On Mon, Oct 17, 2016 at 12:13 PM, Adam Wight <awi...@wikimedia.org> wrote: > Friends, > > I'm helping review a tool <https://www.mediawiki.org/wiki/Extension:Ids> > that I understand Wikimedia Taiwan is eager to use, which uses a parser > hook to render ideographic description characters > <https://en.wikipedia.org/wiki/Ideographic_Description_Characters_(Unicode_block)> > into PNG glyphs in order to display historic or rare characters which > aren't covered by Unicode. It's very cool. > > The challenges are first that it's based on a Tomcat backend > <https://github.com/Wikimedia-TW/han3_ji7_tsoo1_kian3_WM/blob/master/src/idsrend/services/IDSrendServlet.java>, > which I'm not sure is precedented in our current ecosystem, and second that > the code uses Chinese variable and function names, which should > unfortunately be Anglicized by convention, AIUI. Finally, there might be > security issues around the rendered text itself, if it were misused to mask > content. > > I'm mostly asking this list for help with the question of using Tomcat in > production. > > Thanks, > Adam > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Alexandros Kosiaris <akosia...@wikimedia.org> ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] [Analytics] Pageview API
Hello Bahodir, On Tue, Nov 17, 2015 at 2:15 PM, Bahodir Mansurovwrote: > Agree with everyone else, this is great! > > I just have a question. Is this an evolving thing in a sense that more data > sources will be used to define page views? Let me give an example. Reading > Web team is working on a new web app prototype that caches pages which can > be viewed without hitting the back end. Since no request is made, no page > view will be recorded. This got me intrigued. I am wondering what exactly is meant by that. Got something (wikipage, doc, something...) a curious being like me could read ? ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] Pageview API
It's nice to finally this go live. Great work guys!! On Mon, Nov 16, 2015 at 11:50 PM, Dan Andreescu <dandree...@wikimedia.org> wrote: > Dear Data Enthusiasts, > > > In collaboration with the Services team, the analytics team wishes to > announce a public Pageview API. For an example of what kind of UIs someone > could build with it, check out this excellent demo (code). > > > The API can tell you how many times a wiki article or project is viewed over > a certain period. You can break that down by views from web crawlers or > humans, and by desktop, mobile site, or mobile app. And you can find the > 1000 most viewed articles on any project, on any given day or month that we > have data for. We currently have data back through October and we will be > able to go back to May 2015 when the loading jobs are all done. For more > information, take a look at the user docs. > > > After many requests from the community, we were really happy to finally make > this our top priority and get it done. Huge thanks to Gabriel, Marko, Petr, > and Eric from Services, Alexandros and all of Ops really, Henrik for > maintaining stats.grok, and, of course, the many community members who have > been so patient with us all this time. > > > The Research team’s Article Recommender tool already uses the API to rank > pages and determine relative importance. Wiki Education Foundation’s > dashboard is going to be using it to count how many times an article has > been viewed since a student edited it. And there are other grand plans for > this data like “article finder”, which will find low-rated articles with a > lot of pageviews; this can be used by editors looking for high-impact work. > Join the fun, we’re happy to help get you started and listen to your ideas. > Also, if you find bugs or want to suggest improvements, please create a task > in Phabricator and tag it with #Analytics-Backlog. > > > So what’s next? We can think of too many directions to go into, for > pageview data and Wikimedia project data, in general. We need to work with > you to make a great plan for the next few quarters. Please chime in here > with your needs. > > > Team Analytics > > > ___ > Engineering mailing list > engineer...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/engineering > -- Alexandros Kosiaris <akosia...@wikimedia.org> ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] etherpad.wikimedia.org upgrade on Monday 2015-06-15
Hello everyone, On Monday UTC morning, the software powering etherpad.wikimedia.org, etherpad-lite, will be upgraded to version 1.5.6-2 from 1.4.1-3. This upgrade sets us back on track with etherpad-lite releases. Changelogs for the interested are here: https://github.com/ether/etherpad-lite/blob/develop/CHANGELOG.md The reason for this heads up is that after the upgrade, users will have to force a full refresh in old pads they revisit in order to clear the browser cache. Otherwise in some revisited pads, a corrupted interface will show up with a message about a missing Cookie. The sequence for this is highly dependent on the browser. Ctrl+F5, F5, Command+R depending on browser/OS does the trick most times. Refer to your browser documentation for an accurate shortcut if you don't already know it. Regards, -- Alexandros Kosiaris akosia...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Investigating building an apps content service using RESTBase and Node.js
Good point. Ideally, what we would need to do is provide the right tools to developers to create services, which can then be placed strategically around DCs (in cooperation with Ops, ofc). Yes. As an organization we should provide good tools that allow developers to create services. I do fail to understand the strategically around DCs part though. For v1, however, we plan to provide only logical separation (to a certain extent) via modules which can be dynamically loaded/unloaded from RESTBase. modules ? Care to explain a bit more ? AFAIK RESTBase is a revision storage service and to be honest I am fighting to understand what modules you are referring to and the architecture behind those modules. In return, RESTBase will provide them with routing, monitoring, caching and authorisation out of the box. The good point here is that this 'modularisation' eases the transition to a more-decomposed orchestration SOA model. Going in that direction, however, requires some prerequisites to be fulfilled, such as [1]. While revision caching can very well be done by RESTBase (AFAIK, that is one reason it is being created for), authorization (It's not revision authorization, but generic authentication/authorization I am referring to) and monitoring should not be provided by RESTBase to any service. Especially monitoring. Services (whatever their nature) should provide discoverable (REST if you like, as I suspect you do) endpoints that allow monitoring via third party tools and not depend on an another service for that. My take is that there should be a swagger manifest that describes a basic monitoring framework and services should each independently implement it (including RESTBase) I am also a bit unclear on the routing aspect. Care to point out an up to date architectural diagram ? I have been told in person that the one https://www.npmjs.com/package/restbase is not up to date so I can't comment on that. -- Alexandros Kosiaris akosia...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Current status of IPv6 connectivity?
Hello, Are there any reports that could allow one to check the status of IPv6 connectivity in the WMF cluster? If not, could you let me know if IPv6 is deployed on all servers and if you consider it stable? IPv6 is mostly deployed on a per server/service basis right now, but we do consider it stable. Unfortunately there is no way right now to check the status of IPv6 connectivity but there is a Tech Ops goal of better monitoring which includes plans to make IPv6 connectivity checking a first class citizen (same as IPv4) I'm seeing huge variations in IPv6 performance when connecting to WMF servers, while the rest of the internet seems to work and I'm trying to determine if the problem is on my side, somewhere on the way, or in the WMF network. Well tools like mtr or even IPv6 enabled traceroute could be of immense help there. I must say, I 've not found a performance problem in the IPv6 connectivity yet in the WMF cluster. Feel free to provide us with more information, though if you have persistent problems ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Introducing Math rendering 2.0
Really happy to see this going live! On my part, many thanks to Moritz for pushing this forward and being such a cool person to work with, and of course the rest of the team for helping push out such a cool service :-) On Thu, Oct 23, 2014 at 11:11 PM, Gabriel Wicke gwi...@wikimedia.org wrote: Dear Wikipedians, We'd like to announce a major update of the Math (rendering) extension. For registered Wikipedia users, we have introduced a new math rendering mode using MathML, a markup language for mathematical formulae. Since MathML is not supported in all browsers [1], we have also added a fall-back mode using scalable vector graphics (SVG). Both modes offer crisp rendering at any resolution, which is a major advantage over the current image-based default. We'll also be able to make our math more accessible by improving screenreader and magnification support. We encourage you to enable the MathML mode in your Appearance preferences. As an example, the URL for this section on the English Wikipedia is: https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-rendering For editors, there are also two new optional features: 1) You can set the id attribute to create math tags that can be referenced. For example, the following math tag math id=MassEnergyEquivalence E=mc^2 /math can be referenced by the wikitext [[#MassEnergyEquivalence|mass energy equivalence]] This is true regardless of the rendering mode used. 2) In addition, there is the attribute display with the possible values block or inline. This attribute can be used to control the layout of the math tag with regard to centering and size of the operators. See https://www.mediawiki.org/wiki/Extension:Math/Displaystyle for a full description, of this feature. Your feedback is very welcome. Please report bugs in Bugzilla against the Math extension, or post on the talk page here: https://www.mediawiki.org/wiki/Extension_talk:Math All this is brought to you by Moritz Schubotz and Frédéric Wang (both volunteers) in collaboration with Gabriel Wicke, C. Scott Ananian, Alexandros Kosiaris and Roan Kattouw from the Wikimedia Foundation. We also owe a big thanks to Peter Krautzberger and Davide P. Cervone of MathJax for the server-side math rendering backend. Best, Gabriel Wicke (GWicke) and Moritz Schubotz (Physikerwelt) [1]: Currently MathML is supported by Firefox other Gecko-based browsers, and accessibility tools like Apple's VoiceOver. There is also partial support in WebKit. ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Alexandros Kosiaris akosia...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Deprecation notice for etherpad-old.wikimedia.org
Hello everyone, This has just taken place. etherpad-old.wikimedia.org no longer exists. On Tue, Nov 26, 2013 at 6:09 PM, Alexandros Kosiaris akosia...@wikimedia.org wrote: Hello, As many of you might be aware, etherpad.wikimedia.org has been migrated a couple of months ago from the old and no longer supported etherpad software to the new, actively supported, etherpad-lite software. The move also involved the migration of pads from the old software to the new, a process which was quite successful, albeit not without glitches. Since then the old installation has been kept around under http://etherpad-old.wikimedia.org in a read-only state in order to facilitate people to access pads that, for whatever reason, may not have made it to the new installation unscathed (or at all). This service will be discontinued and taken offline on Monday, 30 December 2013. That grants 30+ days to people to copy out any necessary pads. With that in mind, the operations team would like to remind everyone that etherpad.wikimedia.org was never intended to be a permanent storage for pads. Preservation of a pad is up to the people interested in preserving that pad in another format. Regards, -- Alexandros Kosiaris akosia...@wikimedia.org -- Alexandros Kosiaris akosia...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Deprecation notice for etherpad-old.wikimedia.org
Hello, As many of you might be aware, etherpad.wikimedia.org has been migrated a couple of months ago from the old and no longer supported etherpad software to the new, actively supported, etherpad-lite software. The move also involved the migration of pads from the old software to the new, a process which was quite successful, albeit not without glitches. Since then the old installation has been kept around under http://etherpad-old.wikimedia.org in a read-only state in order to facilitate people to access pads that, for whatever reason, may not have made it to the new installation unscathed (or at all). This service will be discontinued and taken offline on Monday, 30 December 2013. That grants 30+ days to people to copy out any necessary pads. With that in mind, the operations team would like to remind everyone that etherpad.wikimedia.org was never intended to be a permanent storage for pads. Preservation of a pad is up to the people interested in preserving that pad in another format. Regards, -- Alexandros Kosiaris akosia...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Fwd: [wikimedia #6138] etherpad.wikimedia.org downtime due to upgrade
FYI, -- Forwarded message -- From: Core operations via RT core-...@rt.wikimedia.org Date: Thu, Oct 31, 2013 at 12:18 PM Subject: [wikimedia #6138] etherpad.wikimedia.org downtime due to upgrade To: akosia...@wikimedia.org Scheduling a downtime for etherpad.wikimedia.org on Wednesday 06/11/2013 in order to upgrade it to the latest released version. The downtime is scheduled to last one (1) hour and will start in 09:00 UTC. We will be upgrading from 1.2.11 (released 3 months ago) to 1.3 (released 10 days ago). Package will be created and will be made available on apt.wikimedia.org during the upgrade. This upgrade will reportedly solve some issues with pad corruption experienced by the Language Engineering team. -- (ticket has been created) -- Alexandros Kosiaris akosia...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Fwd: [wikimedia #5912] Upgrade PHP throughout the cluster to 5.3.10-1ubuntu3.8+wmf1 on Thursday 2013-10-08
FYI -- Forwarded message -- From: Core operations via RT core-...@rt.wikimedia.org Date: Tue, Oct 8, 2013 at 12:14 PM Subject: [wikimedia #5912] Upgrade PHP throughout the cluster to 5.3.10-1ubuntu3.8+wmf1 on Thursday 2013-10-08 To: akosia...@wikimedia.org Scheduling an upgrade of PHP through the cluster to from 5.3.10-1ubuntu3.6+wmf1 to 5.3.10-1ubuntu3.8+wmf1. The changes are three CVEs CVE-2013-4635 CVE-2013-4113 CVE-2013-4248 and bug #63055 per RT #5209 (which was not solved due to one more bug) The packages have been built and tested on beta and test.wikipedia.org and no problems have arisen. The upgrade is expected to not be noticeable. -- (ticket has been created) -- Alexandros Kosiaris akosia...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Fwd: [wikimedia #5841] etherpad.wikimedia.org downtime due to upgrade
Hello Federico, Some (but not all) of the etherpad links you provided do not have any content. There are two ways to proceed. If they are new enough (after 2013-08-19) it might be a bug in etherpad-lite. In that case we should wait for the upgrade, try to reproduce and fill a bug report to the authors of the software. I am afraid the content of the pad is probably lost. If they are old enough (before 2013-08-19), there probably was a problem with the migration from the old etherpad to the new one. The scripts provided by the authors to do the migration were buggy enough to justify such a problem. We did have to copy some pads manually, however the ones you list were not among them. In that case it is still possible to recover the content of the pad from the old etherpad installation (we have kept it around). All you need to do is access it via http://etherpad-old.wikimedia.org instead of http://etherpad.wikimedia.org On Fri, Sep 27, 2013 at 10:53 PM, Federico Leva (Nemo) nemow...@gmail.com wrote: What should I do if a number of etherpads seem to be completely gone? Can someone click on any of the pad links on https://meta.wikimedia.org/wiki/Wikimedia_Conference_2013/Schedule/Saturday and tell me if they are able to see some content? Thanks. Nemo ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l -- Alexandros Kosiaris akosia...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Fwd: [wikimedia #5841] etherpad.wikimedia.org downtime due to upgrade
FYI, -- Forwarded message -- From: core-...@rt.wikimedia.org Date: Mon, Sep 30, 2013 at 2:12 PM Subject: Fwd: [wikimedia #5841] etherpad.wikimedia.org downtime due to upgrade etherpad-lite has been successfully update to version 1.2.11. The upgrade procedure server-wise was uneventfull, however it will cause some minor problems to existing users of the service. Specifically CSS/JS elements of the page have changed and need to be re-downloaded by the browser, however due to browser caching this does not happen automatically. Users of the old version will have to FORCE REFRESH their browser when accessing the service for the first time. Otherwise they will get garbled versions of the user interface. Pad contents will be intact, however a brief message suggesting the user does not have permission to access a pad might show up. That message is inaccurate and is a by-product of the garbled UI. -- Alexandros Kosiaris akosia...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Fwd: [wikimedia #5841] etherpad.wikimedia.org downtime due to upgrade
FYI -- Forwarded message -- From: Core operations via RT core-...@rt.wikimedia.org Date: Thu, Sep 26, 2013 at 1:01 PM Subject: [wikimedia #5841] etherpad.wikimedia.org downtime due to upgrade To: akosia...@wikimedia.org I am scheduling a downtime for etherpad.wikimedia.org on Monday 30/09/2013 in order to upgrade it to the latest released version. The downtime is scheduled to last one (1) hour and will start in 09:00 UTC. We will be upgrading from 1.0 (released 2 years ago) to 1.2.11 (released 3 months ago). Package has already been created and will be made available on apt.wikimedia.org during the upgrade. Hopefully a lot of the bugs we have witnessed that cause problems in etherpad.wikimedia.org will be resolved. -- (ticket has been created) -- Alexandros Kosiaris akosia...@wikimedia.org ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l