[Wikitech-l] Re: March 2024 Switchover completed successfully

2024-03-20 Thread Alexandros Kosiaris
A minor correction, we will be switching to Dallas on Wednesday, September
25th.

On Wed, Mar 20, 2024 at 5:23 PM Alexandros Kosiaris 
wrote:

> Hello everyone,
>
> Please join us in celebrating a very successful Datacenter Switchover.
> This switch to our data center in Virginia was run by Effie Mouzeli.
> Despite some minor hiccup on Effie's network connection (a similar thing
> happened to Clément a year ago, this is starting to become a pattern) it
> was completed without a hitch.
>
> For context, the Site Reliability Team (SRE) runs a planned data center
> switchover periodically, moving all wikis from our primary data center in
> (for this instance, Texas) to the secondary data center (for this instance,
> Virginia). This is an important periodic test of our tools and procedures,
> to ensure the wikis will continue to be available even in the event of
> major technical issues. It also gives all our SRE and ops teams a chance to
> do maintenance and upgrades on systems that normally run 24 hours a day.
>
> The switchover process requires a brief read-only period for all
> Foundation-hosted wikis, which started at 14:00 UTC on Wednesday March
> 20th, and lasted 3 minutes and 8 seconds. All our public and private wikis
> continued to be available for reading as usual. Users saw a notification of
> the upcoming maintenance, and anyone still editing was asked to try again
> in a few minutes.
>
> As with the previous Switchover, I 've been trying to discern the effect
> of the Switchover in many of the graphs we have to monitor the
> infrastructure in https://grafana.wikimedia.org. In many, it's impossible
> to tell the event. We consider this very nice and attribute it to various
> improvements done throughout the years from many teams, in and outside SRE.
> The most discernible graph we have is of the edit rate.
>
> This switchover is our first where we are predominantly on MediaWiki on
> Kubernetes, setting a very nice milestone for the project.
>
> As per our newer process, we no longer have a Switchback. We will be
> staying in Virginia as our primary data center for the next 6 months,
> switching back to Virginia on Wednesday, September 25th.
>
> As always, my deepest thanks to all people that have helped with this, in
> one way or another, ranging from the person running point, to all SREs and
> developers/deployers participating or having contributed, to people in
> Movement Communications for helping with the messaging.
>
> To report any issues, you can reach us in #wikimedia-sre on IRC, or file a
> Phabricator ticket with the datacenter-switchover tag (pre-filled form
> here); we'll be monitoring closely for reports of trouble during and after
> the switchover. (If you're new to Phab, there's more information at
> Phabricator/Help.) The switchover, preparation as well as followup actions
> are tracked in Phabricator Task T357547
>
> --
> Alexandros Kosiaris
> Principal Site Reliability Engineer
> Wikimedia Foundation
>


-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] March 2024 Switchover completed successfully

2024-03-20 Thread Alexandros Kosiaris
Hello everyone,

Please join us in celebrating a very successful Datacenter Switchover. This
switch to our data center in Virginia was run by Effie Mouzeli. Despite
some minor hiccup on Effie's network connection (a similar thing happened
to Clément a year ago, this is starting to become a pattern) it was
completed without a hitch.

For context, the Site Reliability Team (SRE) runs a planned data center
switchover periodically, moving all wikis from our primary data center in
(for this instance, Texas) to the secondary data center (for this instance,
Virginia). This is an important periodic test of our tools and procedures,
to ensure the wikis will continue to be available even in the event of
major technical issues. It also gives all our SRE and ops teams a chance to
do maintenance and upgrades on systems that normally run 24 hours a day.

The switchover process requires a brief read-only period for all
Foundation-hosted wikis, which started at 14:00 UTC on Wednesday March
20th, and lasted 3 minutes and 8 seconds. All our public and private wikis
continued to be available for reading as usual. Users saw a notification of
the upcoming maintenance, and anyone still editing was asked to try again
in a few minutes.

As with the previous Switchover, I 've been trying to discern the effect of
the Switchover in many of the graphs we have to monitor the infrastructure
in https://grafana.wikimedia.org. In many, it's impossible to tell the
event. We consider this very nice and attribute it to various improvements
done throughout the years from many teams, in and outside SRE. The most
discernible graph we have is of the edit rate.

This switchover is our first where we are predominantly on MediaWiki on
Kubernetes, setting a very nice milestone for the project.

As per our newer process, we no longer have a Switchback. We will be
staying in Virginia as our primary data center for the next 6 months,
switching back to Virginia on Wednesday, September 25th.

As always, my deepest thanks to all people that have helped with this, in
one way or another, ranging from the person running point, to all SREs and
developers/deployers participating or having contributed, to people in
Movement Communications for helping with the messaging.

To report any issues, you can reach us in #wikimedia-sre on IRC, or file a
Phabricator ticket with the datacenter-switchover tag (pre-filled form
here); we'll be monitoring closely for reports of trouble during and after
the switchover. (If you're new to Phab, there's more information at
Phabricator/Help.) The switchover, preparation as well as followup actions
are tracked in Phabricator Task T357547

-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: User style to reduce Gerritbot comments on Phabricator

2023-12-04 Thread Alexandros Kosiaris
I 've just installed it too and this is indeed really nice. Thanks for this!

On Thu, Nov 30, 2023 at 2:37 AM Bartosz Dziewoński 
wrote:

> On 2023-11-29 23:20, Zoran Dori wrote:
> > Wow, thank you Bartosz, it looks AMAZING!
> >
> > How we can install it?
>
> Thanks :)
>
> You'll need to install a browser extension (add-on) that allows adding
> user styles, and then add this style to it.
>
> There are button on the page I linked that should somewhat guide you
> through it. The simplest way is to click "Get Stylus", follow the links
> there to your browser's add-ons site, and install it; then come back to
> that page, click "Install" and then confirm it.
>
>
> --
> Bartosz Dziewoński
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Datacenter Switchover process change

2023-08-31 Thread Alexandros Kosiaris
Hello,

I 'd like to inform everyone of what we consider a big change (for the
better) in the Datacenter Switchover process. The full rationale, planning
and implementation is documented at

https://wikitech.wikimedia.org/wiki/Switch_Datacenter/Recurring,_Equinox-based,_Data_Center_Switchovers

and it includes a TL;DR that I am pasting below for everyone's convenience:

Site Reliability Engineering will, starting September 2023, run a data
center Switchover every 6 months, in the week of the solar Equinox
<https://en.wikipedia.org/wiki/Equinox>, namely the *work weeks containing
March 21st and September 21st*. If you are interested to learn more about
Switchovers and why we perform them, or already know what they are and want
to learn more about how this proposal would impact your workflows or the
Wikimedia Movement, please read on.


We hope that making the Switchover dates and duration predictable will
allow the teams involved and/or utilizing a Switchover, as well as the
entire movement reap the benefits we anticipate and document in the doc
linked above.


Regards,
--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: eqiad wikikube kubernetes cluster upgrade on 2023-03-07

2023-03-09 Thread Alexandros Kosiaris
Hello everyone,

Upgrade, done. Cluster has been successfully upgraded to 1.23 and
applications have just been redeployed. toolhub is operational again.

On Fri, Mar 3, 2023 at 3:45 PM Alexandros Kosiaris 
wrote:

> Hello everyone,
>
> TL;DR Toolhub will have a few hours of downtime due to maintenance on
> Tuesday 2023-03-07 Furthermore, if you are not deploying services to the
> eqiad wikikube kubernetes
> cluster, you can safely skip the rest.
>
> Long version:
>
> We will reinitialize the eqiad wikikube kubernetes cluster using
> kubernetes
> version 1.23 on 2023-03-07 09:00-16:00 UTC [1] (the actual process is
> expected
> take a couple of hours within this window).
> The date was chosen for convenience as due to the data center switchover
> process, eqiad is fully depooled, receiving almost 0 traffic. This is
> scheduled to change on 2023-03-08, making the process more difficult. As
> all traffic
> has been drained already and we expect no visible impact. However, for the
> duration of the process, the kubernetes cluster will be unavailable to
> deployers and thus efforts to deploy to it will fail or worse, not have the
> expected outcomes.
> This is normal until SRE serviceops announces that the cluster is fully
> operational again.
>
> SRE serviceops will be deploying all services before marking the cluster as
> usable so there will be no need for deployers to
> re-deploy their services (apart from those already informed).
>
> Toolhub, per https://phabricator.wikimedia.org/T329319 wasn't switched
> over to codfw and is still being served from wikikube eqiad. Unavoidably,
> it will suffer a small downtime of a few hours. That is known and expected.
> To minimize that downtime, it will be prioritized during the initialization
> phase.
>
> [1] https://phabricator.wikimedia.org/T331126
>
> --
> Alexandros Kosiaris
> Principal Site Reliability Engineer
> Wikimedia Foundation
>


-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] eqiad wikikube kubernetes cluster upgrade on 2023-03-07

2023-03-03 Thread Alexandros Kosiaris
Hello everyone,

TL;DR Toolhub will have a few hours of downtime due to maintenance on
Tuesday 2023-03-07 Furthermore, if you are not deploying services to the
eqiad wikikube kubernetes
cluster, you can safely skip the rest.

Long version:

We will reinitialize the eqiad wikikube kubernetes cluster using kubernetes
version 1.23 on 2023-03-07 09:00-16:00 UTC [1] (the actual process is
expected
take a couple of hours within this window).
The date was chosen for convenience as due to the data center switchover
process, eqiad is fully depooled, receiving almost 0 traffic. This is
scheduled to change on 2023-03-08, making the process more difficult. As
all traffic
has been drained already and we expect no visible impact. However, for the
duration of the process, the kubernetes cluster will be unavailable to
deployers and thus efforts to deploy to it will fail or worse, not have the
expected outcomes.
This is normal until SRE serviceops announces that the cluster is fully
operational again.

SRE serviceops will be deploying all services before marking the cluster as
usable so there will be no need for deployers to
re-deploy their services (apart from those already informed).

Toolhub, per https://phabricator.wikimedia.org/T329319 wasn't switched over
to codfw and is still being served from wikikube eqiad. Unavoidably, it
will suffer a small downtime of a few hours. That is known and expected. To
minimize that downtime, it will be prioritized during the initialization
phase.

[1] https://phabricator.wikimedia.org/T331126

-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: [Ops] codfw wikikube kubernetes cluster upgrade on 2023-02-21

2023-02-21 Thread Alexandros Kosiaris
Hello everyone,

The cluster was successfully re-initialized today, all services have been
re-pooled and are in service. The cluster is fully operational again and
can be used by deployers.

Regards,

On Wed, Feb 15, 2023 at 1:41 PM Janis Meybohm 
wrote:

> Hello everyone,
>
> TL;DR if you are not deploying services to the codfw wikikube kubernetes
> cluster, you can safely skip this.
>
> Long version:
>
> We will reinitialize the codfw wikikube kubernetes cluster with kubernetes
> version 1.23 on 2023-02-21 09:00-16:00 UTC [1] (the actual process is
> expected
> take a couple of hours within this window).
> The date was chosen for convenience as we will have depooled all
> active/active
> services from codfw for row B switch maintenance [2] anyways. As all
> traffic
> will be drained beforehand we expect no user visible impact. However, for
> the
> duration of the process, the kubernetes cluster will be unavailable to
> deployers and thus efforts to deploy to it will fail or worse, not have
> the
> expected outcomes.
> This is normal until SRE serviceops announces that the cluster is fully
> operational again.
>
> SRE serviceops will be deploying all services before marking the cluster
> as
> usable and pooling traffic back to it, so there will be no need for
> deployers to
> re-deploy their services (apart from those already informed).
>
> [1] https://phabricator.wikimedia.org/T329664
> [2] https://phabricator.wikimedia.org/T327991
>
> Regars,
> Janis Meybohm
>
> ___
> Ops mailing list -- o...@lists.wikimedia.org
> To unsubscribe send an email to ops-le...@lists.wikimedia.org
>


-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

Re: [Wikitech-l] eqiad kubernetes cluster upgrade on 2021-03-23

2021-03-23 Thread Alexandros Kosiaris
Hello everyone,

This has happened. The cluster has been reinitialized and upgraded and
all services have been redeployed by SRE Service Operations. So, the
cluster is fully operational again, feel free to deploy. Traffic
hasn't been switched yet back as we are still making sure that it's
also fully traffic capable as well, but it's expected to happen at the
latest tomorrow.

On Tue, Mar 23, 2021 at 10:23 AM Alexandros Kosiaris
 wrote:
>
> Hello everyone,
>
> This is starting now. Keep in mind that if you try to deploy to eqiad
> k8s today, it WILL fail or just won't do what you expect it to do.
>
> On Fri, Mar 19, 2021 at 10:02 PM Alexandros Kosiaris
>  wrote:
> >
> > Hello everyone,
> >
> > TL;DR if you are not deploying services to the eqiad kubernetes
> > cluster, you can safely skip this.
> >
> > Long version:
> >
> > After having tested thrice our cluster reinitialization procedure, next
> > week, on Tuesday 2021-03-23 we will be reinitializing our eqiad
> > kubernetes cluster. All
> > traffic will be drained from it beforehand and we expect no user
> > visible impact. However, for the duration of the process, the
> > kubernetes eqiad cluster will be unavailable to deployers and thus
> > efforts to deploy to it will fail or worse, not have the expected
> > outcomes. This is normal until SRE serviceops announces that the
> > cluster is fully operational again.
> >
> > SRE service-ops will be deploying all services before marking the
> > cluster as usable and pooling traffic back to it, so there will be no
> > need for deployers to re-deploy their services.
> >
> > For your convenience the list of services that are currently deployed
> > on that cluster is: apertium api-gateway blubberoid changeprop
> > changeprop-jobqueue citoid cxserver echostore eventgate-analytics
> > eventgate-analytics-external eventgate-logging-external eventgate-main
> > eventstreams eventstreams-internal linkrecommendation mathoid
> > mobileapps proton push-notifications recommendation-api sessionstore
> > similar-users termbox wikifeeds zotero
> >
> > Regards,
> >
> > --
> > Alexandros Kosiaris
> > Principal Site Reliability Engineer
> > Wikimedia Foundation
>
>
>
> --
> Alexandros Kosiaris
> Principal Site Reliability Engineer
> Wikimedia Foundation



-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] eqiad kubernetes cluster upgrade on 2021-03-23

2021-03-23 Thread Alexandros Kosiaris
Hello everyone,

This is starting now. Keep in mind that if you try to deploy to eqiad
k8s today, it WILL fail or just won't do what you expect it to do.

On Fri, Mar 19, 2021 at 10:02 PM Alexandros Kosiaris
 wrote:
>
> Hello everyone,
>
> TL;DR if you are not deploying services to the eqiad kubernetes
> cluster, you can safely skip this.
>
> Long version:
>
> After having tested thrice our cluster reinitialization procedure, next
> week, on Tuesday 2021-03-23 we will be reinitializing our eqiad
> kubernetes cluster. All
> traffic will be drained from it beforehand and we expect no user
> visible impact. However, for the duration of the process, the
> kubernetes eqiad cluster will be unavailable to deployers and thus
> efforts to deploy to it will fail or worse, not have the expected
> outcomes. This is normal until SRE serviceops announces that the
> cluster is fully operational again.
>
> SRE service-ops will be deploying all services before marking the
> cluster as usable and pooling traffic back to it, so there will be no
> need for deployers to re-deploy their services.
>
> For your convenience the list of services that are currently deployed
> on that cluster is: apertium api-gateway blubberoid changeprop
> changeprop-jobqueue citoid cxserver echostore eventgate-analytics
> eventgate-analytics-external eventgate-logging-external eventgate-main
> eventstreams eventstreams-internal linkrecommendation mathoid
> mobileapps proton push-notifications recommendation-api sessionstore
> similar-users termbox wikifeeds zotero
>
> Regards,
>
> --
> Alexandros Kosiaris
> Principal Site Reliability Engineer
> Wikimedia Foundation



-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] eqiad kubernetes cluster upgrade on 2021-03-23

2021-03-19 Thread Alexandros Kosiaris
Hello everyone,

TL;DR if you are not deploying services to the eqiad kubernetes
cluster, you can safely skip this.

Long version:

After having tested thrice our cluster reinitialization procedure, next
week, on Tuesday 2021-03-23 we will be reinitializing our eqiad
kubernetes cluster. All
traffic will be drained from it beforehand and we expect no user
visible impact. However, for the duration of the process, the
kubernetes eqiad cluster will be unavailable to deployers and thus
efforts to deploy to it will fail or worse, not have the expected
outcomes. This is normal until SRE serviceops announces that the
cluster is fully operational again.

SRE service-ops will be deploying all services before marking the
cluster as usable and pooling traffic back to it, so there will be no
need for deployers to re-deploy their services.

For your convenience the list of services that are currently deployed
on that cluster is: apertium api-gateway blubberoid changeprop
changeprop-jobqueue citoid cxserver echostore eventgate-analytics
eventgate-analytics-external eventgate-logging-external eventgate-main
eventstreams eventstreams-internal linkrecommendation mathoid
mobileapps proton push-notifications recommendation-api sessionstore
similar-users termbox wikifeeds zotero

Regards,

--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] codfw kubernetes cluster upgrade this week

2021-03-16 Thread Alexandros Kosiaris
Hi,

This has happened now. Out of an abundance of caution, the cluster
isn't going to be repooled right now, but rather tomorrow EU morning,
but it's otherwise fully operational. Deploys will be fully functional
again, so if anything breaks, please let us know in phabricator.

Related task is: https://phabricator.wikimedia.org/T277191 is you care
to follow up the last few steps.

On Tue, Mar 16, 2021 at 10:31 AM Alexandros Kosiaris
 wrote:
>
> Hello everyone,
>
> TL;DR if you are not deploying services to the codfw kubernetes
> cluster, you can safely skip this.
>
> Long version:
>
> After having tested twice our cluster reinitialization procedure, this
> week we will be reinitializing our codfw kubernetes cluster. All
> traffic will be drained from it beforehand and we expect no user
> visible impact. However, for the duration of the process, the
> kubernetes codfw cluster will be unavailable to deployers and thus
> efforts to deploy to it will fail or worse, not have the expected
> outcomes. This is normal until SRE serviceops announces that the
> cluster is fully operational again.
>
> SRE service-ops will be deploying all services before marking the
> cluster as usable and pooling traffic back to it, so there will be no
> need for deployers to re-deploy their services.
>
> For your convenience the list of services that are currently deployed
> on that cluster is: apertium api-gateway blubberoid changeprop
> changeprop-jobqueue citoid cxserver echostore eventgate-analytics
> eventgate-analytics-external eventgate-logging-external eventgate-main
> eventstreams eventstreams-internal linkrecommendation mathoid
> mobileapps proton push-notifications recommendation-api sessionstore
> similar-users termbox wikifeeds zotero
>
> Regards,
>
> --
> Alexandros Kosiaris
> Principal Site Reliability Engineer
> Wikimedia Foundation



-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] codfw kubernetes cluster upgrade this week

2021-03-16 Thread Alexandros Kosiaris
Hello everyone,

TL;DR if you are not deploying services to the codfw kubernetes
cluster, you can safely skip this.

Long version:

After having tested twice our cluster reinitialization procedure, this
week we will be reinitializing our codfw kubernetes cluster. All
traffic will be drained from it beforehand and we expect no user
visible impact. However, for the duration of the process, the
kubernetes codfw cluster will be unavailable to deployers and thus
efforts to deploy to it will fail or worse, not have the expected
outcomes. This is normal until SRE serviceops announces that the
cluster is fully operational again.

SRE service-ops will be deploying all services before marking the
cluster as usable and pooling traffic back to it, so there will be no
need for deployers to re-deploy their services.

For your convenience the list of services that are currently deployed
on that cluster is: apertium api-gateway blubberoid changeprop
changeprop-jobqueue citoid cxserver echostore eventgate-analytics
eventgate-analytics-external eventgate-logging-external eventgate-main
eventstreams eventstreams-internal linkrecommendation mathoid
mobileapps proton push-notifications recommendation-api sessionstore
similar-users termbox wikifeeds zotero

Regards,

-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Fwd: Kubelet / Docker / dockershim

2020-12-08 Thread Alexandros Kosiaris
Forwarding this from upstream kubernetes mailing list.

The TL;DR is that with the release that is due for September 2021
(assuming that happens as planned), Docker will no longer be usable as
a Container Runtime Engine for vanilla Kubernetes. And that's it. All
other usages of Docker remain unchanged.

Given the support cycle of 12 months after a release is out, that
gives us something less than 2 years for having evaluated the
available replacements, settled on one, drafted and implemented a
migration plan. It's a pretty early warning, which is nice.

The above sounds more complicated than it will probably prove, for
what is worth (although the devil is always in the details). As far as
running services in our Wikimedia production kubernetes clusters goes,
we never invested in Docker specific features/customizations on
purpose, choosing to treat it as a replaceable part of the
infrastructure, which should make this easier than initially thought.

I 've created for tracking: https://phabricator.wikimedia.org/T269684

-- Forwarded message -
Από: Davanum Srinivas 
Date: Κυρ, 6 Δεκ 2020, 05:53
Subject: Kubelet / Docker / dockershim
To: Kubernetes developer/contributor discussion
,



Folks,

If you haven't seen the discussions around $SUBJECT, please see [1]
and [2]. Tl;dr Please evaluate and switch to CRI implementations that
are or will be available in the community (like containerd, cri-o
etc).

For those who want to continue to use docker as their runtime, please
see [3] and [4]. There will be changes to how you deploy/run your
clusters as and when Mirantis/Docker folks come up with a migration
plan for a separate (new!) external cri implementation. So watch that
space.

Issues, concerns, we can chat in sig-node slack channel or meetings
(or drop a reply to this note).

Thanks,
Dims

[1] https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/
[2] https://kubernetes.io/blog/2020/12/02/dockershim-faq/
[3] https://twitter.com/justincormack/status/1334976974083780609
[4] https://github.com/Mirantis/cri-dockerd




--
Davanum Srinivas :: https://twitter.com/dims

--
You received this message because you are subscribed to the Google
Groups "Kubernetes developer/contributor discussion" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to kubernetes-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/kubernetes-dev/CANw6fcHRq%2BadjSkrt1dVQfFtcEc1sqtWRY7LktDFxKLt537Kkg%40mail.gmail.com.


--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Fwd: OTRS major version upgrade on Monday September 14th 2020

2020-09-16 Thread Alexandros Kosiaris
FYI for wikitech-l as well.

-- Forwarded message -
From: Alexandros Kosiaris 
Date: Wed, Sep 16, 2020 at 11:16 AM
Subject: Re: OTRS major version upgrade on Monday September 14th 2020
To: Private list for OTRS adminstrators
, 


Hello everyone,

This is to inform you that the upgrade of OTRS to 6.0.29 is now
complete. The migration has gone well and thankfully has finished on
time. The backlog of incoming emails seem to have arrived at the
system as expected and new tickets have been created. I 've validated
the various functionalities that I had planned to validate and
everything seems to be in order (have a look in T187984 if you are
interested in the technical details) https://ticket.wikimedia.org is
now available again for normal use.

As before, bugs that may have arisen due to the upgrade should be
filed under the OTRS project in phabricator, preferably under the OTRS
6 column.

Thank you for your patience,

On Fri, Sep 4, 2020 at 2:20 PM Alexandros Kosiaris
 wrote:
>
> Hello everyone,
>
> This is to let you know that per
> https://phabricator.wikimedia.org/T187984#6433997 on September 14th
> 2020 a 48 hour maintenance window for OTRS from 5.0 to 6.0 will begin.
> While this is an unusually large time window, the migration process
> from 5.0 to 6.0 is rather slow. We have looked into optimizing it, but
> at the end preferred to stick with the software developers
> recommendations.
>
> During the maintenance window the system will be completely offline. That 
> means:
>
> * No access over the web to the interface
> * No scheduled jobs of any kind will be run
> * Email will not be delivered but rather backlogged. It will not be
> lost as our MX systems will accept them and put them in the queue.
> Once the system is back to being fully functional, the emails will
> flow into the system.
>
> We will have a rollback plan ready of course, just in case the
> migration goes awry. We already have tested it but something might
> arise anyway.
>
> I am sorry for any inconvenience this might cause.
>
> --
> Alexandros Kosiaris
> Principal Site Reliability Engineer
> Wikimedia Foundation



--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation


-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] OTRS major version upgrade on Monday September 14th 2020

2020-09-04 Thread Alexandros Kosiaris
I 've already sent the email below to otrs related mailing lists, but
I think wikitech-l can also benefit from this, so here it is.

--
Hello everyone,

This is to let you know that per
https://phabricator.wikimedia.org/T187984#6433997 on September 14th
2020 a 48 hour maintenance window for OTRS from 5.0 to 6.0 will begin.
While this is an unusually large time window, the migration process
from 5.0 to 6.0 is rather slow. We have looked into optimizing it, but
at the end preferred to stick with the software developers
recommendations.

During the maintenance window the system will be completely offline. That means:

* No access over the web to the interface
* No scheduled jobs of any kind will be run
* Email will not be delivered but rather backlogged. It will not be
lost as our MX systems will accept them and put them in the queue.
Once the system is back to being fully functional, the emails will
flow into the system.

We will have a rollback plan ready of course, just in case the
migration goes awry. We already have tested it but something might
arise anyway.

I am sorry for any inconvenience this might cause.

--
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Debian Jessie base Wikimedia container image being phased out

2020-06-30 Thread Alexandros Kosiaris
Hello everyone,

In the interest of proceeding with the deprecation of Jessie in our
infrastructure, SRE ServiceOps will stop maintaining and eventually
remove the base Debian Jessie OCI container image[1], also known as
wikimedia-jessie, from our docker registry.

If you rely on this image in any way, you are strongly urged to move
to our Debian Stretch or Buster images, which will continue to be
maintained for some time (for more information see
https://wikitech.wikimedia.org/wiki/Operating_system_upgrade_policy).

A tentative timeline follows:

* 2020-07-02. https://gerrit.wikimedia.org/r/c/operations/puppet/+/587529
will be merged. This will mean the image will no longer be receiving
any kind of updates but will still be possible to be pulled from the
registry. Workflows are not expected to break.
* 2020-08-03. Removal of all the image tags and versions of the above
image will happen. That will mean that the image will no longer be
able to be pulled from the registry. If you have any workflows that
rely on pulling this image from the registry, they are expected to
break on this date. Images that have already been published and are
based on the Jessie image however will not be touched and will still
be pullable.

If you have a workflow that will break after the 2020-08-03 mark and
are unable to alter it, please reach out to us.

Following the above, we will also investigate whether removing image
versions and tags that depend on the removed image makes sense.

Note that this is the first time we remove a base image from our
registry and as such we are building knowledge and experience around
this.

[1] https://tools.wmflabs.org/dockerregistry/wikimedia-jessie/tags/

Regards,

-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Scrum of scrums/2019-11-27

2019-11-27 Thread Alexandros Kosiaris
I think a correction is due.

* RelEng blocking Product Infrastructure: to create node10/buster images
for Proton service migration [[phab:T237911]]

This is actually SRE blocking Product Infrastructure, as SRE owns those images.

On Wed, Nov 27, 2019 at 7:03 PM Željko Filipin  wrote:
>
> Hi,
>
> for HTML version see
> https://www.mediawiki.org/wiki/Scrum_of_scrums/2019-11-27
>
> Željko
>
> --
>
> = 2019-11-27 =
>
> == Callouts ==
> * Release Engineering - unusual train schedule:
> *** This week: 1.35.0-wmf.8 - group0 only because of Thanksgiving
> *** Next week: 1.35.0-wmf.8 - group1 + group2
> * RelEng blocking Product Infrastructure: to create node10/buster images
> for Proton service migration [[phab:T237911]]
> * Biggest fundraising campaign of the year hits enwiki (and in Canada also
> frwiki) on Dec 2. Let's keep CentralNotice stable!
>
> == Product ==
>
> === Android native app ===
> * Updates:
> ** Suggested Edits v3 features and fundraising banner updates are now live
> in production.
> ** Working on mobile-html integration: [[phab:project/view/4318]]
>
> === Product Infrastructure ===
> * Blocked by:
> ** RelEng: to create node10/buster images for Proton service migration
> [[phab:T237911]]
> * Updates:
> ** Maps:
> *** Investigating new OSM replication engine [[phab:T238554]]
> ** Proton:
> *** Moving Proton to debian buster, blocked on RelEng for node10/buster
> images [[phab:T237911]]
>
> === Structured Data ===
> * Blocking:
> ** Search Platform: Data dumps for SDC: [[phab:T221917]]
> * Updates:
> ** finishing off computer-aided tagging
> ** finishing off Lua support
> ** finishing off blockers for structured data in dumps
> ** adding new input types
>
> === Inuka ===
> * Updates:
> ** KaiOS app settings menu [[phab:T236265]] [[phab:T236312]]
> [[phab:T236314]]
>
> == Technology ==
>
> === Fundraising Tech ===
> * Updates:
> ** Battening down the hatches for the December fundraiser
> ** CiviCRM
> *** Fixing duplicate mailing event records imported from bulk mail house
> *** Investigating weird output from audit file parser for backup credit
> card processor
> *** Adding UI buttons to send end-of-year rollup emails on demand
> ** CentralNotice
> *** Reviewing and finishing up sub-national geotargeting
> *** Still comparing stats between new and old data pipelines
> ** Paymentswiki
> *** small tweaks to CSS and fraud filters
>
> === Core Platform ===
> * Updates:
> ** Page history REST endpoints out on train
> ** "Endgame" API gateway planning
> ** MCR Schema conversion
>
> === Engineering Productivity ===
>
>  Release Engineering 
> * Blocking:
> ** Product Infrastructure: to create node10/buster images for Proton
> service migration [[phab:T237911]]
> * Updates:
> ** Train Health
> *** Last week: no train because of team offsite
> *** This week: 1.35.0-wmf.8 - [[phab:T233856]] - group0 only because of
> Thanksgiving
> *** Next week: 1.35.0-wmf.8 - [[phab:T233856]] - group1 + group2
>
> === Search Platform ===
> * Blocked by:
> ** Structured Data: Data dumps for SDC: [[phab:T221917]]
> * Updates:
> ** Log Wikidata Query Service queries (sparql) to the event gate
> infrastructure [[phab:T101013]] (soon to be deployed)
> ** Increase logging sampling rates for search metrics from 12.5% to 100%
> (8x) [[phab:T197129]]
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Alexandros Kosiaris
Principal Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Scrum of scrums/2019-06-19

2019-06-20 Thread Alexandros Kosiaris
> Also, is Site Reliability Engineering is blocking WMDE or not?
>
> === Site Reliability Engineering ===
> * Blocking:
> ** WMDE has been unblocked on termbox

No we (SRE) are not blocking (anymore) WMDE, mistake on my part in the
notes, should have been filed under updates.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Small change to wmf apache log format

2019-03-20 Thread Alexandros Kosiaris
In light of the gerrit incidents these last few days, and as part of
the process of strengthening gerrit's operational security, we 've
just gone ahead and configured gerrit to add the User: HTTP header on
the response. To take advantage of that, we 've also amended the wmf
apache LogFormat directive to log that header if it exists. I 've
documented the change in
https://wikitech.wikimedia.org/wiki/Apache_log_format.

Note that the order of fields changes just a bit (the last field is
now 17 instead of 16, the 16th is now the User: HTTP header if it
exists, otherwise a -). If you are aware of anything that might break
because of that let us know.

Regards,

-- 
Alexandros Kosiaris
Senior Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Gerrit outage

2019-03-19 Thread Alexandros Kosiaris
Hello Fæ,

While I understand and agree with your point, I must point out that
this 4 days have been hectic on many people from multiple teams. The
amount of work to cleanup one person's destructive half hour spree is
staggering. We need better tooling for sure to combat this, something
that while MediaWiki is already equipped with, some of the
infrastructure tools are not (yet hopefully). It saddens me greatly to
say that, but we might have to take some steps in the opposite
direction, for a while at least, until we are in shape to combat this
more effectively.

Regards,

On Tue, Mar 19, 2019 at 3:40 PM Fæ  wrote:
>
> Thanks to everyone who helped sort this out.
>
> In some ways, the vandalism neatly demonstrates how Wikimedia projects
> rely on trust. When these things happen, it is a nice reminder that
> our open values mean that we should take a light approach to security
> whenever the potential exposure is always going to be recoverable.
> Resilience rather than impenetrable, for our community at least, is a
> healthy way to prioritize. The occasional predictable idiot is no
> reason to change that approach.
>
> Cheers,
> Fae
>
> On Tue, 19 Mar 2019 at 13:28, Alexandros Kosiaris
>  wrote:
> >
> > Gerrit is back up. Almost all of the vandalism has been cleaned up,
> > some minor stuff remains, we will clean that up as well.
> >
> > On Tue, Mar 19, 2019 at 1:42 PM planetenxin  wrote:
> > >
> > > Am 19.03.2019 um 12:21 schrieb Andre Klapper:
> > > > planetenxin: Sorry for my previous message, was not meant to be rude.
> > >
> > > no worries. Hope, that Gerrit is back alive soon. :-)
> > >
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
> --
> fae...@gmail.com https://commons.wikimedia.org/wiki/User:Fae
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Alexandros Kosiaris
Senior Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Gerrit outage

2019-03-19 Thread Alexandros Kosiaris
Gerrit is back up. Almost all of the vandalism has been cleaned up,
some minor stuff remains, we will clean that up as well.

On Tue, Mar 19, 2019 at 1:42 PM planetenxin  wrote:
>
> Am 19.03.2019 um 12:21 schrieb Andre Klapper:
> > planetenxin: Sorry for my previous message, was not meant to be rude.
>
> no worries. Hope, that Gerrit is back alive soon. :-)
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Alexandros Kosiaris
Senior Site Reliability Engineer
Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Code Stewardship Reviews candidates open for comment

2018-12-17 Thread Alexandros Kosiaris
Many many thanks!

On Mon, Dec 17, 2018 at 7:21 AM Jean-Rene Branaa  wrote:
>
> Hello Alexandros,
>
> Makes sense.  We'll extend the current review cycle until January 16th,
> 2019 in order to allow more time due to the coming holidays.
>
> Cheers,
>
> JR
>
>
>
> On Sun, Dec 16, 2018 at 00:30 Alexandros Kosiaris 
> wrote:
>
> > Hi,
> >
> > Any chance the Dec 28th deadline be altered? It's end of the quarter and
> > with the goal wrapping and PTOs, holidays etc I find it a bit difficult to
> > collect all the necessary data to help make an informed decision.
> >
> > Στις Πέμ, 13 Δεκ 2018, 21:00 ο χρήστης Jean-Rene Branaa <
> > jbra...@wikimedia.org> έγραψε:
> >
> > > Hello All,
> > >
> > > We've opened the feedback cycle for the current Code Stewardship
> > > Review candidates.  They include:
> > >
> > > CodeReview extension[0]
> > > UserMerge extension[1]
> > > Graphoid service[2]
> > >
> > > Feedback can be provided via the talk pages for each of the items
> > > under review and/or their associated Phabricator tasks.
> > >
> > > The Code Stewardship review process[3] is intended to help address
> > > code deployed to production that is un/under funded.  The outcome of
> > > this process is generally one of three:  re-investment, no change, or
> > > sunset (ramp down all investment and remove from production
> > > environment).
> > >
> > > Please provide your feedback before December 28th.
> > >
> > > Cheers,
> > >
> > > JR
> > > IRC: jrbranaa
> > > Release Engineering/Code Health
> > >
> > >
> > > [0]
> > >
> > https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/CodeReview
> > > [1]
> > >
> > https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/UserMerge
> > > [2]
> > >
> > https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/Graphoid
> > > [3]https://www.mediawiki.org/wiki/Code_stewardship_reviews
> > >
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Alexandros Kosiaris 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Code Stewardship Reviews candidates open for comment

2018-12-16 Thread Alexandros Kosiaris
Hi,

Any chance the Dec 28th deadline be altered? It's end of the quarter and
with the goal wrapping and PTOs, holidays etc I find it a bit difficult to
collect all the necessary data to help make an informed decision.

Στις Πέμ, 13 Δεκ 2018, 21:00 ο χρήστης Jean-Rene Branaa <
jbra...@wikimedia.org> έγραψε:

> Hello All,
>
> We've opened the feedback cycle for the current Code Stewardship
> Review candidates.  They include:
>
> CodeReview extension[0]
> UserMerge extension[1]
> Graphoid service[2]
>
> Feedback can be provided via the talk pages for each of the items
> under review and/or their associated Phabricator tasks.
>
> The Code Stewardship review process[3] is intended to help address
> code deployed to production that is un/under funded.  The outcome of
> this process is generally one of three:  re-investment, no change, or
> sunset (ramp down all investment and remove from production
> environment).
>
> Please provide your feedback before December 28th.
>
> Cheers,
>
> JR
> IRC: jrbranaa
> Release Engineering/Code Health
>
>
> [0]
> https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/CodeReview
> [1]
> https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/UserMerge
> [2]
> https://www.mediawiki.org/wiki/Code_stewardship_reviews/Feedback_solicitation/Graphoid
> [3]https://www.mediawiki.org/wiki/Code_stewardship_reviews
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Datacenter Switchback recap

2018-10-11 Thread Alexandros Kosiaris
A minor correction:

> During the most critical part of the switch
> today, the wikis were in read-only mode for a duration of 4 minutes
> and 41 seconds.

This was yesterday, not today.

-- 
Alexandros Kosiaris 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Datacenter Switchback recap

2018-10-11 Thread Alexandros Kosiaris
Hello everyone,

Today we've concluded the successful migration of our wikis (MediaWiki
and associated services) from our secondary datacenter (codfw) back to
the primary one (eqiad). During the most critical part of the switch
today, the wikis were in read-only mode for a duration of 4 minutes
and 41 seconds. That's a significant improvement over the 7 mins and
34 seconds we achieved during the inverse process we concluded a month
ago, which was already significantly better than last year. I 'd like
to believe that it's the result of the increasing amount of experience
we are building and trust we are putting in the process and tools that
we have developed for this.

Although the switchback process itself has been largely automated and
went pretty smoothly, there have been some issues that we experienced:

- CentralNotice banners stayed online for a longer time than necessary
due to miscommunication issues. This has now been documented and will
be avoided in the future.

- After the switchback we 've experienced increased load to all our
mediawiki application servers. The root cause has been identified and
mitigation against it will be put in place. The summary is non working
replication of parsercache between the 2 datacenters.

- Last, but not least and probably the most important of all issues, a
data inconsistency was detected in wikidata (s8). Namely some articles
that were present in codfw but were not replicated in eqiad. We are
still investigating the root cause of this while applying corrective
actions to mitigate the user impact as quickly as possible.

All wikis are now served from our primary data center again.

Should you experience any issue that is deemed related to the
switchover process, please feel free to file a ticket in Phabricator
and tag it with the Datacenter-Switchover-2018 project tag[1]. We will
monitor this tag closely and keep any and all issues updated.

We'd like to thank everyone for their hard work in ensuring any
(potential) issues got resolved timely, for automating the process
whenever and wherever possible, and for making this datacenter
switchover and switchback a success!

-- 
Alexandros Kosiaris 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Datacenter switchover and switchback

2018-10-05 Thread Alexandros Kosiaris
I am sorry to hear that. It looks like something that we will have to
take into account for the next switchovers. That being said, we had
deliberations across the involved teams months ago to come up with
those exact dates and have been communicating them via at least SoS
since 2018-08-01.

I am curious about something though. How does the deployment train
(cause that's what we are talking about) impact the software release
exactly ?
On Wed, Oct 3, 2018 at 6:19 PM C. Scott Ananian  wrote:
>
> Oct 8 seems to be a particularly bad time to freeze the train given that we
> are forking for the MW 1.32 release on Oct 15, and a lot of folks have
> last-minute things they want to get into the release (eg deprecations, etc).
>   --scott
>
> On Thu, Aug 30, 2018 at 10:57 AM Pine W  wrote:
>
> > +1 to DJ's question about timing. Also, one might wish to be mindful of
> > the number of recent trains that were supposed to be boring but involved
> > interesting surprises; this makes me wonder whether trains that one thinks
> > will be boring are actually OK in this circumstance even if they turn out
> > to be "interesting".
> >
> > Pine
> > ( https://meta.wikimedia.org/wiki/User:Pine )
> >
> >
> >
> >  Original message From: Derk-Jan Hartman <
> > d.j.hartman+wmf...@gmail.com> Date: 8/30/18  2:54 AM  (GMT-08:00) To:
> > Wikimedia developers  Subject: Re:
> > [Wikitech-l] Datacenter switchover and switchback
> > While I think these regular switches are a very good idea, from an outside
> > perspective I do have to question a process that puts a significant plug in
> > the velocity of various teams working on major projects (esp. in a time of
> > year that could probably be seen as one of the most productive). What are
> > plans to reduce the disruption of this exercise in the future ?
> >
> > DJ
> >
> > On Thu, Aug 30, 2018 at 8:38 AM Jaime Crespo 
> > wrote:
> >
> > > Let me explain the rationale of the bellow request for clarification:
> > >
> > > On Wed, Aug 29, 2018 at 11:30 PM MA  wrote:
> > >
> > > > Hello:
> > > >
> > > > >For the duration of the switchover (1 month), deployers are kindly
> > > > >requested to refrain from large db schema changes and avoid deploying
> > > > >any kind of new feature that requires creation of tables.
> > > > >There will be a train freeze in the week of Sept 10th and Oct 8th.
> > >
> > >
> > > During the failover, some schema changes will be finalized on the current
> > > active datacenter (plus some major server and network maintenance may be
> > > done)- our request is mostly to refrain from quickly enabling those large
> > > new unlocked features (e.g. the ongoing comment refactoring, actor/user
> > > refactoring, Multi Content Revision, JADE, major wikidata or structured
> > > comons structure changes, new extensions not ever deployed to the
> > cluster,
> > > etc.) at the same time than the ongoing maintenance to reduce variables
> > of
> > > things that can go bad- enabling those features may be unblocked during
> > the
> > > switchover time, but we ask you to hold until being back on the current
> > > active datacenter. Basically, ask yourself if you are enabling a large
> > new
> > > core feature or want to start a heavy-write maintenance script and there
> > is
> > > a chance you will need DBA/system support. Sadly, we had some instances
> > of
> > > this happening last year and we want to explicitly discourage this during
> > > these 2 weeks.
> > >
> > > In own my opinion, enabling existing features on smaller projects (size
> > > here is in amount of server resources, not that they are less important)
> > is
> > > equivalent to a swat change, and I am not against it happening. I would
> > ask
> > > contributors to use their best judgement on every case, and ask people on
> > > the #DBA tag on phabricator or add me as reviewers on gerrit if in doubt.
> > > My plea is to not enable major structural changes during that time may
> > > affect thousands of edits per minute. Swat-like changes and "boring" :-)
> > > trains are ok.
> > >
> > > For new wiki creations I would prefer if those were delayed but CC #DBA s
> > > on the phabricator task to check with us.
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org
> > > https://lists.wikimedia.org/m

Re: [Wikitech-l] Datacenter switchover and switchback

2018-10-03 Thread Alexandros Kosiaris
Reminder: The switchback is next week.
On Wed, Aug 29, 2018 at 8:28 PM Alexandros Kosiaris
 wrote:
>
> Hello everyone,
>
> This is to inform you that there will be a datacenter switchover and
> switchback on the next few weeks. The timeline's are
>
> Services: Tuesday, September 11th 2018 14:30 UTC
> Media storage/Swift: Tuesday, September 11th 2018 15:00 UTC
> Traffic: Tuesday, September 11th 2018 19:00 UTC
> MediaWiki: Wednesday, September 12th 2018: 14:00 UTC
>
> Switchback:
>
> Traffic: Wednesday, October 10th 2018 09:00 UTC
> MediaWiki: Wednesday, October 10th 2018: 14:00 UTC
> Services: Thursday, October 11th 2018 14:30 UTC
> Media storage/Swift: Thursday, October 11th 2018 15:00 UTC
>
> For the duration of the switchover (1 month), deployers are kindly
> requested to refrain from large db schema changes and avoid deploying
> any kind of new feature that requires creation of tables.
> There will be a train freeze in the week of Sept 10th and Oct 8th.
>
> The net effect of the switchover and switchback for volunteers is
> expected to be some minutes of inability to save an edit. For readers,
> everything will be as usual.
>
> The tracking task for interested parties is
> https://phabricator.wikimedia.org/T199073
>
> Regards,
>
> --
> Alexandros Kosiaris 



-- 
Alexandros Kosiaris 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Datacenter Switchover recap

2018-09-12 Thread Alexandros Kosiaris
Hello all,

Today we've successfully migrated our wikis (MediaWiki and associated
services)
from our primary data center (eqiad) to our secondary (codfw), an exercise
we've done for the 3rd year in a row. During the most critical part of the
switch today, the wikis were in read-only mode for a duration of 7 and a
half minutes - a significant improvement from last year.

Although the switchover process itself has been largely automated and went
pretty smoothly once started, we did experience some issues leading up to
our maintenance window, which caused us to delay the switch somewhat:

- In the days before the switch a performance issue in the Translate
extension for CentralNotice had been discovered, which was expected to
cause database stampede issues during the switch, and we decided to
mitigate this by temporarily disabling the
extension for the duration of the switchover process. However it's now
understood that this may have caused some unwanted side effects and should
be avoided in the future in favor of other methods.

- Right before the switchover commenced, an eqiad Varnish server
misbehaved, causing a high spike of failed requests. Thankfully the SRE
Traffic team identified and addressed the issue prompty, allowing the
switchover to proceed.

- Two codfw s7 database slaves crashed right before the start of our
maintenance window. This delayed the start of our switchover procedure by
approximately 30 minutes into our maintenance window as we were
investigating cause and impact.

- The ElasticSearch search cluster traffic did not follow MediaWiki traffic
from eqiad to codfw during the switch as was expected, but stayed in our
primary data center instead. Investigation showed that ElasticSearch had
been manually hardcoded to eqiad in its configuration. This was rectified
after the switchover was complete with a configuration change and manual
switch to codfw.

- After the switchover completed we experienced some repetitive database
load spikes, primarily on the codfw s1 cluster (serving English Wikipedia).
The DBA team performed a series of fine tuning and other corrective actions.

All wikis are now served from our secondary codfw data center, and this is
expected to stay that way for the next 4 weeks, when we will reverse this
procedure.

Should you experience any issue that is deemed related to the switchover
process, please feel free to file a ticket in Phabricator and tag it with
the Datacenter-Switchover-2018 project tag[1]. We will monitor this tag
closely and keep any and all issues updated.

We'd like to thank everyone for their hard work in ensuring any (potential)
issues got resolved timely, for automating the process whenever and
wherever possible, and for making this datacenter switch a success!

[1] https://phabricator.wikimedia.org/project/profile/3571/

-- 
Alexandros Kosiaris 
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Datacenter switchover and switchback

2018-08-30 Thread Alexandros Kosiaris
On Thu, Aug 30, 2018 at 12:55 PM Derk-Jan Hartman
 wrote:
>
> While I think these regular switches are a very good idea, from an outside
> perspective I do have to question a process that puts a significant plug in
> the velocity of various teams working on major projects (esp. in a time of
> year that could probably be seen as one of the most productive).

That is absolutely true.
> What are
> plans to reduce the disruption of this exercise in the future ?

There are indeed plans about making this easier, more automated and
shorter in duration. Just to name a few, mediawiki now no longer
requires deployments to switch datacenters but rather relies on a
state key in the etcd database, there is a library + cookbooks to
automate the currently automateable steps, there is work to update the
documentation and make it update to date, accurate and more usable by
more people and at shorter notices and so on. That being said, it's
never gonna be free, but the toil is significant to make us want to
reduce and that what we aim for.


>
> DJ
>
> On Thu, Aug 30, 2018 at 8:38 AM Jaime Crespo  wrote:
>
> > Let me explain the rationale of the bellow request for clarification:
> >
> > On Wed, Aug 29, 2018 at 11:30 PM MA  wrote:
> >
> > > Hello:
> > >
> > > >For the duration of the switchover (1 month), deployers are kindly
> > > >requested to refrain from large db schema changes and avoid deploying
> > > >any kind of new feature that requires creation of tables.
> > > >There will be a train freeze in the week of Sept 10th and Oct 8th.
> >
> >
> > During the failover, some schema changes will be finalized on the current
> > active datacenter (plus some major server and network maintenance may be
> > done)- our request is mostly to refrain from quickly enabling those large
> > new unlocked features (e.g. the ongoing comment refactoring, actor/user
> > refactoring, Multi Content Revision, JADE, major wikidata or structured
> > comons structure changes, new extensions not ever deployed to the cluster,
> > etc.) at the same time than the ongoing maintenance to reduce variables of
> > things that can go bad- enabling those features may be unblocked during the
> > switchover time, but we ask you to hold until being back on the current
> > active datacenter. Basically, ask yourself if you are enabling a large new
> > core feature or want to start a heavy-write maintenance script and there is
> > a chance you will need DBA/system support. Sadly, we had some instances of
> > this happening last year and we want to explicitly discourage this during
> > these 2 weeks.
> >
> > In own my opinion, enabling existing features on smaller projects (size
> > here is in amount of server resources, not that they are less important) is
> > equivalent to a swat change, and I am not against it happening. I would ask
> > contributors to use their best judgement on every case, and ask people on
> > the #DBA tag on phabricator or add me as reviewers on gerrit if in doubt.
> > My plea is to not enable major structural changes during that time may
> > affect thousands of edits per minute. Swat-like changes and "boring" :-)
> > trains are ok.
> >
> > For new wiki creations I would prefer if those were delayed but CC #DBA s
> > on the phabricator task to check with us.
> > _______
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Alexandros Kosiaris 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Datacenter switchover and switchback

2018-08-29 Thread Alexandros Kosiaris
Hello everyone,

This is to inform you that there will be a datacenter switchover and
switchback on the next few weeks. The timeline's are

Services: Tuesday, September 11th 2018 14:30 UTC
Media storage/Swift: Tuesday, September 11th 2018 15:00 UTC
Traffic: Tuesday, September 11th 2018 19:00 UTC
MediaWiki: Wednesday, September 12th 2018: 14:00 UTC

Switchback:

Traffic: Wednesday, October 10th 2018 09:00 UTC
MediaWiki: Wednesday, October 10th 2018: 14:00 UTC
Services: Thursday, October 11th 2018 14:30 UTC
Media storage/Swift: Thursday, October 11th 2018 15:00 UTC

For the duration of the switchover (1 month), deployers are kindly
requested to refrain from large db schema changes and avoid deploying
any kind of new feature that requires creation of tables.
There will be a train freeze in the week of Sept 10th and Oct 8th.

The net effect of the switchover and switchback for volunteers is
expected to be some minutes of inability to save an edit. For readers,
everything will be as usual.

The tracking task for interested parties is
https://phabricator.wikimedia.org/T199073

Regards,

-- 
Alexandros Kosiaris 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] grafana.wikimedia.org migration to the native LDAP implementation on June 27th

2018-06-27 Thread Alexandros Kosiaris
Hello everyone,

The migration is completed. Now grafana.wikimedia.org supports the
"Sign-in" action (accessible from the top-left button) that allows to
edit dashboard by providing LDAP credentials.

grafana-admin.wikimedia.org is deprecated and no longer working as it
used to and will be removed at an unspecified later point in time. I
would urge everyone to update bookmarks, links and anything else using
it to now use grafana.wikimedia.org

Don't hesitate to reach out in
https://phabricator.wikimedia.org/T170150 if you experience any
problems.

Regards,
On Wed, Jun 27, 2018 at 12:06 PM Alexandros Kosiaris
 wrote:
>
> FYI, this is happening today.
> On Wed, Jun 13, 2018 at 2:14 PM Alexandros Kosiaris
>  wrote:
> >
> > Hello everyone,
> >
> > If you don't edit/create dashboards on grafana.wikimedia.org feel free
> > to skip the rest of this email, it does not affect you. Those of you
> > who do, please read on.
> >
> > Our grafana installation dates back quite a bit and it has a custom
> > LDAP authentication and authorization implementation. Nowadays Grafana
> > has native support for LDAP and can query LDAP groups. After some
> > lengthy evaluation and problem solving in
> > https://phabricator.wikimedia.org/T170150, the SRE team feels ready to
> > deprecate the custom LDAP implementation and migrate to the native
> > one. What does this mean?
> >
> > * grafana-admin.wikimedia.org (the entry point for users to
> > authenticate and create dashboards) will be removed
> > * grafana.wikimedia.org WILL remain as is and will very only slightly
> > change, specifically the login menu part (see below)
> > * You will be logging in using a menu in the left hand side of
> > grafana.wikimedia.org using the exact same credentials as before.
> > * The grafana database user attributes email and name will be
> > automatically populated on login from LDAP and enforced. This is
> > expected to be irrelevant for almost everyone, but if you 've edited
> > manually your profile in grafana expect those 2 fields to be overriden
> > with data from LDAP.
> >
> > The date for the migration is June 27th 09:00 UTC and the maintenance
> > window will be 3 hours long. We don't expect it to really last that
> > long however. Expect grafana.wikimedia.org downtime during that
> > timeframe
> >
> > Regards,
> >
> > --
> > Alexandros Kosiaris 
>
>
>
> --
> Alexandros Kosiaris 



-- 
Alexandros Kosiaris 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] grafana.wikimedia.org migration to the native LDAP implementation on June 27th

2018-06-27 Thread Alexandros Kosiaris
FYI, this is happening today.
On Wed, Jun 13, 2018 at 2:14 PM Alexandros Kosiaris
 wrote:
>
> Hello everyone,
>
> If you don't edit/create dashboards on grafana.wikimedia.org feel free
> to skip the rest of this email, it does not affect you. Those of you
> who do, please read on.
>
> Our grafana installation dates back quite a bit and it has a custom
> LDAP authentication and authorization implementation. Nowadays Grafana
> has native support for LDAP and can query LDAP groups. After some
> lengthy evaluation and problem solving in
> https://phabricator.wikimedia.org/T170150, the SRE team feels ready to
> deprecate the custom LDAP implementation and migrate to the native
> one. What does this mean?
>
> * grafana-admin.wikimedia.org (the entry point for users to
> authenticate and create dashboards) will be removed
> * grafana.wikimedia.org WILL remain as is and will very only slightly
> change, specifically the login menu part (see below)
> * You will be logging in using a menu in the left hand side of
> grafana.wikimedia.org using the exact same credentials as before.
> * The grafana database user attributes email and name will be
> automatically populated on login from LDAP and enforced. This is
> expected to be irrelevant for almost everyone, but if you 've edited
> manually your profile in grafana expect those 2 fields to be overriden
> with data from LDAP.
>
> The date for the migration is June 27th 09:00 UTC and the maintenance
> window will be 3 hours long. We don't expect it to really last that
> long however. Expect grafana.wikimedia.org downtime during that
> timeframe
>
> Regards,
>
> --
> Alexandros Kosiaris 



-- 
Alexandros Kosiaris 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] grafana.wikimedia.org migration to the native LDAP implementation on June 27th

2018-06-13 Thread Alexandros Kosiaris
Hello everyone,

If you don't edit/create dashboards on grafana.wikimedia.org feel free
to skip the rest of this email, it does not affect you. Those of you
who do, please read on.

Our grafana installation dates back quite a bit and it has a custom
LDAP authentication and authorization implementation. Nowadays Grafana
has native support for LDAP and can query LDAP groups. After some
lengthy evaluation and problem solving in
https://phabricator.wikimedia.org/T170150, the SRE team feels ready to
deprecate the custom LDAP implementation and migrate to the native
one. What does this mean?

* grafana-admin.wikimedia.org (the entry point for users to
authenticate and create dashboards) will be removed
* grafana.wikimedia.org WILL remain as is and will very only slightly
change, specifically the login menu part (see below)
* You will be logging in using a menu in the left hand side of
grafana.wikimedia.org using the exact same credentials as before.
* The grafana database user attributes email and name will be
automatically populated on login from LDAP and enforced. This is
expected to be irrelevant for almost everyone, but if you 've edited
manually your profile in grafana expect those 2 fields to be overriden
with data from LDAP.

The date for the migration is June 27th 09:00 UTC and the maintenance
window will be 3 hours long. We don't expect it to really last that
long however. Expect grafana.wikimedia.org downtime during that
timeframe

Regards,

-- 
Alexandros Kosiaris 

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikimedia-l] Τι σας κάνει ευτυχισμένη αυτήν την εβδομάδα? / What's making you happy this week? (Week of 22 April 2018)

2018-04-30 Thread Alexandros Kosiaris
Aside from the actual content (it's nice to see the legal case in
Greece ending), seeing the subject in Greek was one more reason I
became happy. But I think a small correction is in place. A more
appropriate way of saying "What's making you happy this week?" would
be "Τι σας κάνει ευτυχείς αυτήν την εβδομάδα;", where "ευτυχείς" (an
adjective) would be the plural form of happy, which is used both when
addressing groups of people and when being polite. Alternatively
"ευτυχισμένους", could be used, with exact same meaning, just using
the participle form (and in the appropriate conjugation) instead of
the adjective. "Xαρούμενους" (again a participle, just of a different
verb) would also be valid with the same meaning for most people,
although if one wants to be pedantic, "χαρά" is closer to "joy" than
"happiness".

Regards,

-- 
Alexandros Kosiaris <akosia...@wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Reboot of irc.wikimedia.org

2018-02-22 Thread Alexandros Kosiaris
Hello everyone

This has just happened. Most bots seem to have already reconnected
successfully and so has the rc-pmtpa bot. Everything seems to be in
order.

Regards,

On Thu, Jan 25, 2018 at 9:18 PM, Alexandros Kosiaris
<akosia...@wikimedia.org> wrote:
> Hi,
>
> On Sun, Jan 21, 2018 at 9:32 PM, MZMcBride <z...@mzmcbride.com> wrote:
>> Alexandros Kosiaris wrote:
>>>This is to inform you that on Monday Feb 22nd 2018 ~10:00 UTC, the
>>>infrastructure powering irc.wikimedia.org will be rebooted for
>>>security upgrades. This is expected to only impact bots that are using
>>>irc.wikimedia.org AND are not able to automatically reconnect on
>>>connection failure. From recent experience (the equipment was rebooted
>>>210 ago last time, with no fallout) those are very limited in number
>>>these days.
>>
>> Thank you for this notice.
>
> Sorry for not answering sooner, I 've just seen this.
>
>>
>> Do you know if it's still the case that per-wiki channels on
>> irc.wikimedia.org are not available/able to be joined until there's
>> activity on that wiki?
>
> I did not know that, but I just researched it and that is still the
> case. For future reference the code in question is in
> https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mw_rc_irc/files/udpmxircecho.py;b2d13fbc838a7f994a2e97fdffc07f15fe88bc29$49
>
>
>> My memory is that reconnecting to irc.wikimedia.org
>> has not usually been the issue (as you note), but instead it's been that
>> there's a speaking bot account on the network that would only create a
>> channel after there's some activity to report about that wiki. For larger
>> wikis with lots of activity, this means the channels get created nearly
>> instantly after a server restart. For smaller wikis with little activity,
>> this means that the channels may not get re-created for days or even weeks.
>
> I have no such recollection but it does make sense.
>
>>
>> I just tested irc.wikimedia.org again and it appears that joining/creating
>> arbitrary channels is not allowed. This makes me think that bot accounts
>> and others would be disallowed from joining small/quiet wiki channels
>> until those channels are re-created by by the server/rc-pmtpa, unless some
>> kind of whitelist or workaround has been implemented.
>
> That's true. But I don't expect this to cause any kind of major
> problems and experience up to now supports that. But thanks for
> bringing it up. I learned something today.
>
> --
> Alexandros Kosiaris <akosia...@wikimedia.org>



-- 
Alexandros Kosiaris <akosia...@wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Reboot of irc.wikimedia.org

2018-01-25 Thread Alexandros Kosiaris
Hi,

On Sun, Jan 21, 2018 at 9:32 PM, MZMcBride <z...@mzmcbride.com> wrote:
> Alexandros Kosiaris wrote:
>>This is to inform you that on Monday Feb 22nd 2018 ~10:00 UTC, the
>>infrastructure powering irc.wikimedia.org will be rebooted for
>>security upgrades. This is expected to only impact bots that are using
>>irc.wikimedia.org AND are not able to automatically reconnect on
>>connection failure. From recent experience (the equipment was rebooted
>>210 ago last time, with no fallout) those are very limited in number
>>these days.
>
> Thank you for this notice.

Sorry for not answering sooner, I 've just seen this.

>
> Do you know if it's still the case that per-wiki channels on
> irc.wikimedia.org are not available/able to be joined until there's
> activity on that wiki?

I did not know that, but I just researched it and that is still the
case. For future reference the code in question is in
https://phabricator.wikimedia.org/source/operations-puppet/browse/production/modules/mw_rc_irc/files/udpmxircecho.py;b2d13fbc838a7f994a2e97fdffc07f15fe88bc29$49


> My memory is that reconnecting to irc.wikimedia.org
> has not usually been the issue (as you note), but instead it's been that
> there's a speaking bot account on the network that would only create a
> channel after there's some activity to report about that wiki. For larger
> wikis with lots of activity, this means the channels get created nearly
> instantly after a server restart. For smaller wikis with little activity,
> this means that the channels may not get re-created for days or even weeks.

I have no such recollection but it does make sense.

>
> I just tested irc.wikimedia.org again and it appears that joining/creating
> arbitrary channels is not allowed. This makes me think that bot accounts
> and others would be disallowed from joining small/quiet wiki channels
> until those channels are re-created by by the server/rc-pmtpa, unless some
> kind of whitelist or workaround has been implemented.

That's true. But I don't expect this to cause any kind of major
problems and experience up to now supports that. But thanks for
bringing it up. I learned something today.

-- 
Alexandros Kosiaris <akosia...@wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Reboot of irc.wikimedia.org

2018-01-18 Thread Alexandros Kosiaris
Hello everyone,

This is to inform you that on Monday Feb 22nd 2018 ~10:00 UTC, the
infrastructure powering irc.wikimedia.org will be rebooted for
security upgrades. This is expected to only impact bots that are using
irc.wikimedia.org AND are not able to automatically reconnect on
connection failure. From recent experience (the equipment was rebooted
210 ago last time, with no fallout) those are very limited in number
these days.

Regards,

-- 
Alexandros Kosiaris <akosia...@wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Setting up multiple Parsoid servers behind load balancer

2017-06-09 Thread Alexandros Kosiaris
Hi James,

I don't know if you have noticed the following in C. Scott's response

> At any rate: in your configurations you have URL and HTTPProxy set to the
> exact same string.  This is almost certainly not right.  I believe if you
> just omit the proxy lines entirely from the configuration you'll find
> things work as you expect.
>  --scott

but I could not help but notice the error too. AFAIK setting these
variables instruct both software to use http://192.168.56.63:8001/ as
a forward proxy which is NOT what you have there. HAproxy is a reverse
proxy software, not a forward proxy (although you can abuse it to
achieve that functionality). In the setup you describe there is no
need for forward proxies so neither parsoid nor mediawiki need a proxy
configuration.

I also don't think you need RESTBase as long as you are willing to
wait for parsoid to finish parsing and returning the result. It should
be fine for small articles, but as these grow larger, you will start
having various performance related problems (for example you might
have to adjust haproxy timeouts). But from what I gather, you are not
there yet.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Setting up a new Tomcat servlet in production?

2016-10-24 Thread Alexandros Kosiaris
On Thu, Oct 20, 2016 at 10:20 AM, 魔法設計師 <shoichi.c...@gmail.com> wrote:
> 2016-10-19 0:45 GMT+08:00 Alexandros Kosiaris <akosia...@wikimedia.org>:
>>
>> Hello,
>>
>> With the preamble of my opinion not being an authoritative point of
>> view at all, I should point out that Java/JVM based services are not
>> especially loved in WMF. Ops does not feel it has the capability of
>> supporting them. There are a few around like Gerrit, Cassandra,
>> ElasticSearch, Kafka but none of these is actually maintained by ops.
>> All of these have owners/maintainers outside of ops (entire teams in
>> some cases), with varying degrees of success. The question of whether
>> it should be Tomcat or Jetty, is a valid one, but serves to alleviate
>> only part of the problem (it's not like Ops hate tomcat but like
>> Jetty). So, there are probably a few social/administrative issues that
>> it might make sense to address first before handling the technical
>> part.
>
> I think what you means  is : if the service is online,for administration  ,
> a maintainer or a team is needed just like Gerrit,Cassandra, and
> ElasticSearch  did.  Not maintained by the OPs. The team need to be set
> first. Am I right?

Yes, that's the very least IMHO.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Setting up a new Tomcat servlet in production?

2016-10-18 Thread Alexandros Kosiaris
Hello,

With the preamble of my opinion not being an authoritative point of
view at all, I should point out that Java/JVM based services are not
especially loved in WMF. Ops does not feel it has the capability of
supporting them. There are a few around like Gerrit, Cassandra,
ElasticSearch, Kafka but none of these is actually maintained by ops.
All of these have owners/maintainers outside of ops (entire teams in
some cases), with varying degrees of success. The question of whether
it should be Tomcat or Jetty, is a valid one, but serves to alleviate
only part of the problem (it's not like Ops hate tomcat but like
Jetty). So, there are probably a few social/administrative issues that
it might make sense to address first before handling the technical
part.

On Mon, Oct 17, 2016 at 12:13 PM, Adam Wight <awi...@wikimedia.org> wrote:
> Friends,
>
> I'm helping review a tool <https://www.mediawiki.org/wiki/Extension:Ids>
> that I understand Wikimedia Taiwan is eager to use, which uses a parser
> hook to render ideographic description characters
> <https://en.wikipedia.org/wiki/Ideographic_Description_Characters_(Unicode_block)>
> into PNG glyphs in order to display historic or rare characters which
> aren't covered by Unicode.  It's very cool.
>
> The challenges are first that it's based on a Tomcat backend
> <https://github.com/Wikimedia-TW/han3_ji7_tsoo1_kian3_WM/blob/master/src/idsrend/services/IDSrendServlet.java>,
> which I'm not sure is precedented in our current ecosystem, and second that
> the code uses Chinese variable and function names, which should
> unfortunately be Anglicized by convention, AIUI.  Finally, there might be
> security issues around the rendered text itself, if it were misused to mask
> content.
>
> I'm mostly asking this list for help with the question of using Tomcat in
> production.
>
> Thanks,
> Adam
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Alexandros Kosiaris <akosia...@wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] [Analytics] Pageview API

2015-11-17 Thread Alexandros Kosiaris
Hello Bahodir,

On Tue, Nov 17, 2015 at 2:15 PM, Bahodir Mansurov
 wrote:
> Agree with everyone else, this is great!
>
> I just have a question. Is this an evolving thing in a sense that more data
> sources will be used to define page views? Let me give an example. Reading
> Web team is working on a new web app prototype that caches pages which can
> be viewed without hitting the back end. Since no request is made, no page
> view will be recorded.

This got me intrigued. I am wondering what exactly is meant by that.
Got something (wikipage, doc, something...) a curious being like me
could read ?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Pageview API

2015-11-17 Thread Alexandros Kosiaris
It's nice to finally this go live. Great work guys!!

On Mon, Nov 16, 2015 at 11:50 PM, Dan Andreescu
<dandree...@wikimedia.org> wrote:
> Dear Data Enthusiasts,
>
>
> In collaboration with the Services team, the analytics team wishes to
> announce a public Pageview API.  For an example of what kind of UIs someone
> could build with it, check out this excellent demo (code).
>
>
> The API can tell you how many times a wiki article or project is viewed over
> a certain period.  You can break that down by views from web crawlers or
> humans, and by desktop, mobile site, or mobile app.  And you can find the
> 1000 most viewed articles on any project, on any given day or month that we
> have data for.  We currently have data back through October and we will be
> able to go back to May 2015 when the loading jobs are all done.  For more
> information, take a look at the user docs.
>
>
> After many requests from the community, we were really happy to finally make
> this our top priority and get it done.  Huge thanks to Gabriel, Marko, Petr,
> and Eric from Services, Alexandros and all of Ops really, Henrik for
> maintaining stats.grok, and, of course, the many community members who have
> been so patient with us all this time.
>
>
> The Research team’s Article Recommender tool already uses the API to rank
> pages and determine relative importance.  Wiki Education Foundation’s
> dashboard is going to be using it to count how many times an article has
> been viewed since a student edited it.  And there are other grand plans for
> this data like “article finder”, which will find low-rated articles with a
> lot of pageviews; this can be used by editors looking for high-impact work.
> Join the fun, we’re happy to help get you started and listen to your ideas.
> Also, if you find bugs or want to suggest improvements, please create a task
> in Phabricator and tag it with #Analytics-Backlog.
>
>
> So what’s next?  We can think of too many directions to go into, for
> pageview data and Wikimedia project data, in general.  We need to work with
> you to make a great plan for the next few quarters.  Please chime in here
> with your needs.
>
>
> Team Analytics
>
>
> ___
> Engineering mailing list
> engineer...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/engineering
>



-- 
Alexandros Kosiaris <akosia...@wikimedia.org>

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] etherpad.wikimedia.org upgrade on Monday 2015-06-15

2015-06-12 Thread Alexandros Kosiaris
Hello everyone,

On Monday UTC morning, the software powering etherpad.wikimedia.org,
etherpad-lite, will be upgraded to version 1.5.6-2 from 1.4.1-3. This
upgrade sets us back on track with etherpad-lite releases. Changelogs for
the interested are here:
https://github.com/ether/etherpad-lite/blob/develop/CHANGELOG.md

The reason for this heads up is that after the upgrade, users will have to
force a full refresh in old pads they revisit in order to clear the browser
cache. Otherwise in some revisited pads, a corrupted interface will show up
with a message about a missing Cookie. The sequence for this is highly
dependent on the browser. Ctrl+F5, F5, Command+R depending on browser/OS
does the trick most times. Refer to your browser documentation for an
accurate shortcut if you don't already know it.

Regards,

-- 
Alexandros Kosiaris akosia...@wikimedia.org
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Investigating building an apps content service using RESTBase and Node.js

2015-02-04 Thread Alexandros Kosiaris
 Good point. Ideally, what we would need to do is provide the right tools to
 developers to create services, which can then be placed strategically
 around DCs (in cooperation with Ops, ofc).

Yes. As an organization we should provide good tools that allow
developers to create services. I do fail to understand the
strategically around DCs part though.

 For v1, however, we plan to
 provide only logical separation (to a certain extent) via modules which can
 be dynamically loaded/unloaded from RESTBase.

modules ? Care to explain a bit more ? AFAIK RESTBase is a revision
storage service and to be honest I am fighting to understand what
modules you are referring to and the architecture behind those
modules.

 In return, RESTBase will
 provide them with routing, monitoring, caching and authorisation out of the
 box. The good point here is that this 'modularisation' eases the transition
 to a more-decomposed orchestration SOA model. Going in that direction,
 however, requires some prerequisites to be fulfilled, such as [1].

While revision caching can very well be done by RESTBase (AFAIK, that
is one reason it is being created for), authorization (It's not
revision authorization, but generic authentication/authorization I am
referring to) and monitoring should not be provided by RESTBase to any
service. Especially monitoring. Services (whatever their nature)
should provide discoverable (REST if you like, as I suspect you do)
endpoints that allow monitoring via third party tools and not depend
on an another service for that. My take is that there should be a
swagger manifest that describes a basic monitoring framework and
services should each independently implement it (including RESTBase)

I am also a bit unclear on the routing aspect. Care to point out an up
to date architectural diagram ? I have been told in person that the
one https://www.npmjs.com/package/restbase is not up to date so I
can't comment on that.


-- 
Alexandros Kosiaris akosia...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Current status of IPv6 connectivity?

2015-01-26 Thread Alexandros Kosiaris
Hello,

 Are there any reports that could allow one to check the status of IPv6
 connectivity in the WMF cluster? If not, could you let me know if IPv6
 is deployed on all servers and if you consider it stable?

IPv6 is mostly deployed on a per server/service basis right now, but
we do consider it stable. Unfortunately there is no way right now to
check the status of IPv6 connectivity but there is a Tech Ops goal of
better monitoring which includes plans to make IPv6 connectivity
checking a first class citizen (same as IPv4)

 I'm seeing huge variations in IPv6 performance when connecting to WMF
 servers, while the rest of the internet seems to work and I'm trying
 to determine if the problem is on my side, somewhere on the way, or in
 the WMF network.

Well tools like mtr or even IPv6 enabled traceroute could be of
immense help there. I must say, I 've not found a performance problem
in the IPv6 connectivity yet in the WMF cluster. Feel free to provide
us with more information, though if you have persistent problems

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Introducing Math rendering 2.0

2014-10-24 Thread Alexandros Kosiaris
Really happy to see this going live!

On my part, many thanks to Moritz for pushing this forward and being
such a cool person to work with, and of course the rest of the team
for helping push out such a cool service :-)



On Thu, Oct 23, 2014 at 11:11 PM, Gabriel Wicke gwi...@wikimedia.org wrote:
 Dear Wikipedians,

 We'd like to announce a major update of the Math (rendering) extension.

 For registered Wikipedia users, we have introduced a new math rendering
 mode using MathML, a markup language for mathematical formulae. Since MathML
 is not supported in all browsers [1], we have also added a fall-back mode
 using scalable vector graphics (SVG).

 Both modes offer crisp rendering at any resolution, which is a major
 advantage over the current image-based default. We'll also be able to make
 our math more accessible by improving screenreader and magnification support.

 We encourage you to enable the MathML mode in your Appearance preferences.
 As an example, the URL for this section on the English Wikipedia is:

   https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-rendering

 For editors, there are also two new optional features:

 1) You can set the id attribute to create math tags that can be
 referenced. For example, the following math tag

 math id=MassEnergyEquivalence
 E=mc^2
 /math

 can be referenced by the wikitext

 [[#MassEnergyEquivalence|mass energy equivalence]]

 This is true regardless of the rendering mode used.

 2) In addition, there is the attribute display with the possible values
 block or inline. This attribute can be used to control the layout of the
 math tag with regard to centering and size of the operators. See
 https://www.mediawiki.org/wiki/Extension:Math/Displaystyle
 for a full description, of this feature.

 Your feedback is very welcome. Please report bugs in Bugzilla against the
 Math extension, or post on the talk page here:
 https://www.mediawiki.org/wiki/Extension_talk:Math

 All this is brought to you by Moritz Schubotz and Frédéric Wang (both
 volunteers) in collaboration with Gabriel Wicke, C. Scott Ananian,
 Alexandros Kosiaris and Roan Kattouw from the Wikimedia Foundation. We also
 owe a big thanks to Peter Krautzberger and Davide P. Cervone of MathJax for
 the server-side math rendering backend.

 Best,

 Gabriel Wicke (GWicke) and Moritz Schubotz (Physikerwelt)


 [1]: Currently MathML is supported by Firefox  other Gecko-based browsers,
 and accessibility tools like Apple's VoiceOver. There is also partial
 support in WebKit.

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Alexandros Kosiaris akosia...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Deprecation notice for etherpad-old.wikimedia.org

2013-12-30 Thread Alexandros Kosiaris
Hello everyone,

This has just taken place. etherpad-old.wikimedia.org no longer exists.

On Tue, Nov 26, 2013 at 6:09 PM, Alexandros Kosiaris
akosia...@wikimedia.org wrote:
 Hello,

 As many of you might be aware, etherpad.wikimedia.org has been
 migrated a couple of months ago from the old and no longer supported
 etherpad software to the new, actively supported, etherpad-lite
 software. The move also involved the migration of  pads from the old
 software to the new, a process which was quite successful, albeit not
 without glitches. Since then the old installation has been kept around
 under http://etherpad-old.wikimedia.org in a read-only state in order
 to facilitate people to access pads that, for whatever reason, may not
 have made it to the new installation unscathed (or at all). This
 service will be discontinued and taken offline on Monday, 30 December
 2013. That grants 30+ days to people to copy out any necessary pads.
 With that in mind, the operations team would like to remind everyone
 that etherpad.wikimedia.org was never intended to be a permanent
 storage for pads. Preservation of a pad is up to the people interested
 in preserving that pad in another format.

 Regards,

 --
 Alexandros Kosiaris akosia...@wikimedia.org



-- 
Alexandros Kosiaris akosia...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Deprecation notice for etherpad-old.wikimedia.org

2013-11-26 Thread Alexandros Kosiaris
Hello,

As many of you might be aware, etherpad.wikimedia.org has been
migrated a couple of months ago from the old and no longer supported
etherpad software to the new, actively supported, etherpad-lite
software. The move also involved the migration of  pads from the old
software to the new, a process which was quite successful, albeit not
without glitches. Since then the old installation has been kept around
under http://etherpad-old.wikimedia.org in a read-only state in order
to facilitate people to access pads that, for whatever reason, may not
have made it to the new installation unscathed (or at all). This
service will be discontinued and taken offline on Monday, 30 December
2013. That grants 30+ days to people to copy out any necessary pads.
With that in mind, the operations team would like to remind everyone
that etherpad.wikimedia.org was never intended to be a permanent
storage for pads. Preservation of a pad is up to the people interested
in preserving that pad in another format.

Regards,

-- 
Alexandros Kosiaris akosia...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fwd: [wikimedia #6138] etherpad.wikimedia.org downtime due to upgrade

2013-10-31 Thread Alexandros Kosiaris
FYI,


-- Forwarded message --
From: Core operations via RT core-...@rt.wikimedia.org
Date: Thu, Oct 31, 2013 at 12:18 PM
Subject: [wikimedia #6138] etherpad.wikimedia.org downtime due to upgrade
To: akosia...@wikimedia.org


Scheduling a downtime for etherpad.wikimedia.org on Wednesday 06/11/2013 in
order to upgrade it to the latest released version. The downtime is scheduled
to last one (1) hour and will start in 09:00 UTC. We will be upgrading from
1.2.11 (released 3 months ago) to 1.3 (released 10 days ago). Package will be
created and will be made available on apt.wikimedia.org during the upgrade.
This upgrade will reportedly solve some issues with pad corruption experienced
by the Language Engineering team.


--
(ticket has been created)


-- 
Alexandros Kosiaris akosia...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fwd: [wikimedia #5912] Upgrade PHP throughout the cluster to 5.3.10-1ubuntu3.8+wmf1 on Thursday 2013-10-08

2013-10-08 Thread Alexandros Kosiaris
FYI


-- Forwarded message --
From: Core operations via RT core-...@rt.wikimedia.org
Date: Tue, Oct 8, 2013 at 12:14 PM
Subject: [wikimedia #5912] Upgrade PHP throughout the cluster to
5.3.10-1ubuntu3.8+wmf1 on Thursday 2013-10-08
To: akosia...@wikimedia.org


Scheduling an upgrade of PHP through the cluster to from 5.3.10-1ubuntu3.6+wmf1
to 5.3.10-1ubuntu3.8+wmf1. The changes are three CVEs

CVE-2013-4635
CVE-2013-4113
CVE-2013-4248

and bug #63055 per RT #5209 (which was not solved due to one more bug)

The packages have been built and tested on beta and test.wikipedia.org and no
problems have arisen. The upgrade is expected to not be noticeable.


--
(ticket has been created)


-- 
Alexandros Kosiaris akosia...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fwd: [wikimedia #5841] etherpad.wikimedia.org downtime due to upgrade

2013-09-30 Thread Alexandros Kosiaris
Hello Federico,

Some (but not all) of the etherpad links you provided do not have any
content. There are two ways to proceed.

If they are new enough (after 2013-08-19) it might be a bug in
etherpad-lite. In that case we should wait for the upgrade, try to
reproduce and fill a bug report to the authors of the software. I am
afraid the content of the pad is probably lost.

If they are old enough (before 2013-08-19), there probably was a
problem with the migration from the old etherpad to the new one. The
scripts provided by the authors to do the migration were buggy enough
to justify such a problem. We did have to copy some pads manually,
however the ones you list were not among them. In that case it is
still possible to recover the content of the pad from the old etherpad
installation (we have kept it around). All you need to do is access it
via http://etherpad-old.wikimedia.org instead of
http://etherpad.wikimedia.org


On Fri, Sep 27, 2013 at 10:53 PM, Federico Leva (Nemo)
nemow...@gmail.com wrote:
 What should I do if a number of etherpads seem to be completely gone?
 Can someone click on any of the pad links on
 https://meta.wikimedia.org/wiki/Wikimedia_Conference_2013/Schedule/Saturday
 and tell me if they are able to see some content? Thanks.

 Nemo


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Alexandros Kosiaris akosia...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fwd: [wikimedia #5841] etherpad.wikimedia.org downtime due to upgrade

2013-09-30 Thread Alexandros Kosiaris
FYI,


-- Forwarded message --
From:  core-...@rt.wikimedia.org
Date: Mon, Sep 30, 2013 at 2:12 PM
Subject: Fwd: [wikimedia #5841] etherpad.wikimedia.org downtime due to upgrade

etherpad-lite has been successfully update to version 1.2.11. The
upgrade procedure server-wise was uneventfull, however it will cause
some minor problems to existing users of the service. Specifically
CSS/JS elements of the page have changed and need to be re-downloaded
by the browser, however due to browser caching this does not happen
automatically. Users of the old version will have to FORCE REFRESH
their browser when accessing the service for the first time. Otherwise
they will get garbled versions of the user interface. Pad contents
will be intact, however a brief message suggesting the user does not
have permission to access a pad might show up. That message is
inaccurate and is a by-product of the garbled UI.


-- 
Alexandros Kosiaris akosia...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Fwd: [wikimedia #5841] etherpad.wikimedia.org downtime due to upgrade

2013-09-27 Thread Alexandros Kosiaris
FYI

-- Forwarded message --
From: Core operations via RT core-...@rt.wikimedia.org
Date: Thu, Sep 26, 2013 at 1:01 PM
Subject: [wikimedia #5841] etherpad.wikimedia.org downtime due to upgrade
To: akosia...@wikimedia.org


I am scheduling a downtime for etherpad.wikimedia.org on Monday 30/09/2013 in
order to upgrade it to the latest released version. The downtime is scheduled
to last one (1) hour and will start in 09:00 UTC. We will be upgrading from 1.0
(released 2 years ago) to 1.2.11 (released 3 months ago). Package has already
been created and will be made available on apt.wikimedia.org during the
upgrade. Hopefully a lot of the bugs we have witnessed that cause problems in
etherpad.wikimedia.org will be resolved.


--
(ticket has been created)


-- 
Alexandros Kosiaris akosia...@wikimedia.org

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l