[Wikitech-l] Re: API issue

2024-01-09 Thread Giuseppe Lavagetto
Hi,

without further details on what requests you make, what user-agent your
script uses, and the times at which you've seen this issue, it's going to
be hard to help you.

My bet is that you've incurred into one of our throttling/anti-abuse rules.

If you open a task on phabricator with details (if possible, an IP would
also help, in that case feel free to make the task private) and point me to
it, I can investigate what is going on. I'm pretty confident the problem is
not the servers being overloaded.

Cheers,

G.


On Wed, Jan 10, 2024 at 6:28 AM ovskmendov--- via Wikitech-l <
wikitech-l@lists.wikimedia.org> wrote:

> I know. As I said before, this literally started on the first request I
> made.
>
> Sent with Proton Mail <https://proton.me/> secure email.
>
> On Wednesday, January 10th, 2024 at 12:23 AM, Dalba 
> wrote:
>
> Maybe this is not it, but worth checking: "Make your requests in series
> rather than in parallel, by waiting for one request to finish before
> sending a new request."[1]
> [1]: https://www.mediawiki.org/wiki/API:Etiquette
>
> On Wed, Jan 10, 2024 at 8:45 AM ovskmendov--- via Wikitech-l <
> wikitech-l@lists.wikimedia.org> wrote:
>
>> >Maybe someone else is doing it from the same IP, or maybe it's a bug.
>> You could file a report
>> <https://phabricator.wikimedia.org/maniphest/task/edit/form/43/> in
>> Phabricator with the details of what you are doing and the details from
>> that error page.
>>
>> Given that I have a clean residential IPv6, that is very unlikely. Plus,
>> it was working just a few days. And it still doesn’t work with a VPN.
>>
>> I don’t think it’s a bug as a few requests go through fine, based on the
>> behavior, I think it’s server overload or something like that.
>>
>> Sent with Proton Mail <https://proton.me/> secure email.
>>
>> On Wednesday, January 10th, 2024 at 12:06 AM, Gergo Tisza <
>> gti...@wikimedia.org> wrote:
>>
>> On Tue, Jan 9, 2024 at 8:53 PM ovskmendov--- via Wikitech-l <
>> wikitech-l@lists.wikimedia.org> wrote:
>>
>>> I am not sending very many requests. I haven’t sent any requests in
>>> several days and yet I get the error message.
>>>
>>
>> Maybe someone else is doing it from the same IP, or maybe it's a bug. You
>> could file a report
>> <https://phabricator.wikimedia.org/maniphest/task/edit/form/43/> in
>> Phabricator with the details of what you are doing and the details from
>> that error page.
>>
>>
>> ___
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: March 2023 Datacenter Switchover

2023-03-01 Thread Giuseppe Lavagetto
hat the datacenter switchover will happen
>> on
>> >> *Wednesday
>> >> March 1st* starting at *14:00 UTC.*
>> >>
>> >> Please refer to the original email for any additional information. As
>> >> always, you can reach out to me directly or the SRE team in
>> >> #wikimedia-sre
>> >> on IRC with any question, or through Phabricator.
>> >>
>> >> Thank you,
>> >>
>> >> On Tue, Feb 14, 2023 at 1:58 PM Clément Goubert <
>> cgoub...@wikimedia.org>
>> >> wrote:
>> >>
>> >>> Dear Wikitechians,
>> >>>
>> >>> On *Wednesday March 1st*, the SRE team will run a planned data center
>> >>> switchover, moving all wikis from our primary data center in Virginia
>> to
>> >>> the secondary data center in Texas. This is an important periodic test
>> >>> of
>> >>> our tools and procedures, to ensure the wikis will continue to be
>> >>> available
>> >>> even in the event of major technical issues in our primary home. It
>> also
>> >>> gives all our SRE and ops teams a chance to do maintenance and
>> upgrades
>> >>> on
>> >>> systems in Virginia that normally run 24 hours a day.
>> >>>
>> >>> The switchover process requires a *brief read-only period for all
>> >>> Foundation-hosted wikis*, which will start at *14:00 UTC on Wednesday
>> >>> March 1st*, and will last for a few minutes while we execute the
>> >>> migration as efficiently as possible. All our public and private wikis
>> >>> will
>> >>> be continuously available for reading as usual, but no one will be
>> able
>> >>> to
>> >>> save edits during the process. Users will see a notification of the
>> >>> upcoming maintenance, and anyone still editing will be asked to try
>> >>> again
>> >>> in a few minutes.
>> >>>
>> >>> CommRel has already begun notifying communities of the read-only
>> window.
>> >>> A similar event will follow a few weeks later, when we move back to
>> >>> Virginia. This is currently scheduled for *Wednesday, April 26th*.
>> >>>
>> >>> If you like, you can follow along on the day in the public
>> >>> #wikimedia-operations channel on IRC (instructions for joining here
>> >>> <https://meta.wikimedia.org/wiki/IRC/Instructions>). To report any
>> >>> issues, you can reach us in #wikimedia-sre on IRC, or file a
>> Phabricator
>> >>> ticket with the *datacenter-switchover* tag (pre-filled form here
>> >>> <
>> https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=Datacenter-Switchover=Clement_Goubert
>> >);
>> >>> we'll be monitoring closely for reports of trouble during and after
>> the
>> >>> switchover. (If you're new to Phab, there's more information at
>> >>> Phabricator/Help.) The switchover and its preparation are tracked
>> >>> tracked in Phabricator Task T327920
>> >>> <https://phabricator.wikimedia.org/T327920>
>> >>>
>> >>> On behalf of the SRE team, please excuse the disruption, and our
>> thanks
>> >>> to everyone in a number of departments who've been involved in
>> planning
>> >>> this work for the past weeks. Feel free to reply directly to me with
>> any
>> >>> questions.
>> >>>
>> >>> Thank you,
>> >>>
>> >>> --
>> >>> Clément Goubert (they/them)
>> >>> Senior SRE
>> >>> Wikimedia Foundation
>> >>>
>> >>
>> >>
>> >> --
>> >> Clément Goubert (they/them)
>> >> Senior SRE
>> >> Wikimedia Foundation
>> >>
>> >
>> >
>> > --
>> > Clément Goubert (they/them)
>> > Senior SRE
>> > Wikimedia Foundation
>> >
>> ___
>> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
>> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
>>
>> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
>
>
>
> --
> Amir (he/him)
>
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: ClassCrawler – extremely fast and structured code search engine

2022-02-06 Thread Giuseppe Lavagetto
On Sat, Feb 5, 2022 at 10:19 PM Daniel Kinzler 
wrote:

> Am 05.02.22 um 21:38 schrieb Amir Sarabadani:
>
> Codesearch has been working fine in the past couple of years. There is a
> new frontend being built and I hope we can deploy it soon to provide a
> better user experience and I personally don't see a value in
> re-implementing codesearch. Especially using non-open source software.
>
> While I agree with several points that have been raised, in particular
> about licensing and building on top of existing tools, I'd like to point
> out that the idea is not to re-implement codesearch, but to overcome some
> of its limitations. What we use codesearch for most is finding usages of
> methods (and sometimes classes). This works fine if the method name is
> fairly unique. But if the method name is generic, or you are moving a
> method from one class to another an you want to find callers of the old
> method, but not the new method, then regular experssions just don't cut it.
>
> Ok, why do you think symbol search can't be integrated in the current
codesearch? That's what Amir was proposing. Sadly I don't think much of the
current code of ClassCrawler can be reused for that goal, and it's a pity.

Cheers,
 Giuseppe
-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: New experimental backend for the Wikimedia Debug browser extension: what to expect

2021-07-28 Thread Giuseppe Lavagetto
A quick update: it was decided that, while we get to the point where we can
keep releases in sync with production with ease, we will limit public
access to the k8s installation to group0 and test wikis. The restriction
will be lifted once we feel confident we'll run the same code in k8s and
physical servers all the time.

Thanks,

Giuseppe

On Tue, Jul 27, 2021 at 6:16 PM Giuseppe Lavagetto 
wrote:

> Hi all,
>
> This email is of interest to you only if you're a user of the "Wikimedia
> Debug" browser extension. If you're not, you can safely skip it.
>
> As the more attentive might have noticed, the Wikimedia Debug browser
> extension started offering a new option in the drop-down menu, besides the
> usual mwdebug servers, labeled "k8s-experimental". That is, as the name
> suggests, a very experimental setup of mediawiki running on kubernetes and
> is not *yet* a place where you will be able to test your releases.
>
> Right now, that installation is a work in progress, but nonetheless it
> seemed important to us to have a way to browse the wikis from the
> installation running on kubernetes while we iron out bugs in preparation
> for the actual migration of production traffic.
>
> This installation can thus:
> - run on outdated versions of mediawiki (although we're trying to follow
> train releases).
> -  be down for extended periods of time while we debug something, without
> warning.
> It also doesn't support (yet) profiling via xhprof.
>
> So while we welcome the curious to poke around at the performance and bugs
> of the installation, it is not a suitable tool (yet) to debug your releases
> on.
>
> I will add filters where appropriate to avoid logs from this installation
> from polluting your dashboards in the coming days, but in the meantime, if
> you see a log line coming from a server with a strange name like
> "mediawiki-pinkunicorn"... that's mediawiki running on kubernetes and you
> can mostly ignore it!
>
> You can follow our progress at https://phabricator.wikimedia.org/T283056
>
> Cheers,
>
> Giuseppe
>
> --
> Giuseppe Lavagetto
> Principal Site Reliability Engineer, Wikimedia Foundation
>


-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] New experimental backend for the Wikimedia Debug browser extension: what to expect

2021-07-27 Thread Giuseppe Lavagetto
Hi all,

This email is of interest to you only if you're a user of the "Wikimedia
Debug" browser extension. If you're not, you can safely skip it.

As the more attentive might have noticed, the Wikimedia Debug browser
extension started offering a new option in the drop-down menu, besides the
usual mwdebug servers, labeled "k8s-experimental". That is, as the name
suggests, a very experimental setup of mediawiki running on kubernetes and
is not *yet* a place where you will be able to test your releases.

Right now, that installation is a work in progress, but nonetheless it
seemed important to us to have a way to browse the wikis from the
installation running on kubernetes while we iron out bugs in preparation
for the actual migration of production traffic.

This installation can thus:
- run on outdated versions of mediawiki (although we're trying to follow
train releases).
-  be down for extended periods of time while we debug something, without
warning.
It also doesn't support (yet) profiling via xhprof.

So while we welcome the curious to poke around at the performance and bugs
of the installation, it is not a suitable tool (yet) to debug your releases
on.

I will add filters where appropriate to avoid logs from this installation
from polluting your dashboards in the coming days, but in the meantime, if
you see a log line coming from a server with a strange name like
"mediawiki-pinkunicorn"... that's mediawiki running on kubernetes and you
can mostly ignore it!

You can follow our progress at https://phabricator.wikimedia.org/T283056

Cheers,

Giuseppe

-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Stream of recent changes diffs

2021-07-09 Thread Giuseppe Lavagetto
On Thu, Jul 1, 2021 at 3:10 PM Andrew Otto  wrote:

> This isn't helpful now, but your use case is relevant to something I hope
> to pursue in the future: comprehensive mediawiki change events, including
> content.  I don't have a great place yet for collecting these use cases, so
> I added it to Modern Event Platform parent ticket
> <https://phabricator.wikimedia.org/T185233> so I don't forget. :)
>
>
I don't think this is the use-case at all. As someone else already pointed
out, diffs don't always give you the context and might be unparsable
wikitext. So what you can do is either:
1) Send always the full content of the page changed in the stream, along
with the diff. This is IMHO extremely wasteful, but it's also easy to
implement
2) find a way to analyze the edits  and emit specialized event tags that
define what has changed. This is the correct way to go forward, IMHO, but
it requires much more engineering time.

I don't think there is really a big value in adding the full content of the
page to every edit event. I'd rather suggest that people fetch the parsoid
HTML from the API, and ensure we do good edge-side caching.


Cheers,

Giuseppe
P.S. Please note that I'm only referring to streams offered to tools and in
general to the public internet. Internally to the production cluster the
use of content in events might (or might not) prove directly useful in some
cases.


-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Change to how we build the wikimedia base container images.

2021-05-26 Thread Giuseppe Lavagetto
[X-posting from ops-l]

Hi all,

Starting today, we are building our base container images using
debuerreotype instead than bootstrap-vz, which is unmaintained [1]. This is
the same tool that is used for the dockerhub debian images, and our images
are now completely equivalent to the debian base images, plus our own apt
configuration[2].

With this change, we're also introducing a simpler nomenclature for our
base images:
we will tag our images with "$codename" instead than with
"wikimedia-$codename". Thus:

- the base stretch image is now docker-registry.wikimedia.org/stretch
- the base buster image is now docker-registry.wikimedia.org/buster

We have also added a new image based on the (yet unreleased, caveat emptor)
debian bullseye.

We will keep tagging the latest version of those images as
"wikimedia-stretch" and "wikimedia-buster" for the time being, in order to
allow for backwards compatibility, but we encourage everyone to migrate
eventually to the new naming.

Cheers,

Giuseppe

[1] https://phabricator.wikimedia.org/T281984
[2] Our very simple build script is here:
https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/refs/heads/production/modules/docker/files/build-bare-slim.sh

-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] TechCom meeting 2020-10-28

2020-10-28 Thread Giuseppe Lavagetto
This is the weekly TechCom board review in preparation of our meeting on
Wednesday. If there are additional topics for TechCom to review, please let
us know by replying to this email. However, please keep discussion about
individual RFCs to the Phabricator tickets.

Activity since Monday 2020-10-19 on the following boards:

https://phabricator.wiki09media.org/tag/techcom/
<https://phabricator.wikimedia.org/tag/techcom/>

https://phabricator.wikimedia.org/tag/techcom-rfc/

Committee board activity:

   -

   T239742 <https://phabricator.wikimedia.org/T239742>: Should npm packages
   maintained by Wikimedia be scoped or unscoped? Support for the idea of
   namespacing our npm packages has been expressed.


RFCs:

Phase progression:

   -

   T262946 <https://phabricator.wikimedia.org/T262946> Bump Firefox version
   in basic support to 3.6 or newer.
   Started the last call process. Some clarification requested, but no new
   opposition.

IRC meeting request:

   -

   T263841 <https://phabricator.wikimedia.org/T263841> RFC: Expand API
   title generator to support other generated data
   Some guidance through the process is requested to techcom.

Other RFC activity:

   - T119173 <https://phabricator.wikimedia.org/T119173>: RFC:   Discourage
   use of MySQL's ENUM type. A question was asked to the DBAs about using
   ENUMs in maintenance

Cheers,
Giuseppe
-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] TechCom meeting 2020-10-06

2020-10-06 Thread Giuseppe Lavagetto
This is the weekly TechCom board review in preparation of our meeting on
Wednesday. If there are additional topics for TechCom to review, please let
us know by replying to this email. However, please keep discussion about
individual RFCs to the Phabricator tickets.

Activity since Monday 2020-09-28 on the following boards:

https://phabricator.wikimedia.org/tag/techcom/

https://phabricator.wikimedia.org/tag/techcom-rfc/

Committee inbox:

   -

   T264334 <https://phabricator.wikimedia.org/T264334>: Could the
   registered module manifest be removed from the client?
   -

  New task about the possibility of removing the huge module registry
  from the js sent to the client. The idea is being discussed.

Committee board activity: Nothing to report, besides inbox

New RFCs: none.

Phase progression:

   -

   T262946 <https://phabricator.wikimedia.org/T262946>: Bump Firefox
   version in basic support to 3.6 or newer
   -

  Moves to P3 (explore)
  -

  It is pointed out that we’ve dropped support in production for TLS
  1.0/1.1 in january, so de facto only Firefox 27+ is able to
connect to the
  wikimedia sites
  -

  In light of that, it’s suggested that we might bump the minimum
  supported versions of browsers further.

IRC meeting request: none

Other RFC activity:

   -

   T260714 <https://phabricator.wikimedia.org/T260714>: Parsoid Extension
   API.
   -

  Last call to be approved, that will end on October 7 (tomorrow)
  -

   T487 <https://phabricator.wikimedia.org/T487>: RfC: Associated
   namespaces.
   -

  On last call to be declined, there is some opposition to the
  opportunity of marking it as declined on phabricator. Last call
should end
  on October 7 (tomorrow)
  -

   T263841 <https://phabricator.wikimedia.org/T263841>: RFC: Expand API
   title generator to support other generated data.
   -

  Erik asks if this is going to be generally applied to all generators
  or not.

Cheers,
Giuseppe
-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Ops] CI downtime Monday May 11st 12:00 UTC

2020-05-07 Thread Giuseppe Lavagetto
On Thu, May 7, 2020 at 5:20 PM Antoine Musso  wrote:
[CUT]

> If a change has to happen on the configuration repositories:
>
> * operations/puppet :
>   locally run: bundle update && bundle exec rake test
>

If you have docker installed, you can also run the script
$puppet_dir/utils/run_ci_locally.sh

that, unsurprisingly, runs the same tests as CI does, in the same container.

Cheers,
Giuseppe
-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] For title normalization, what characters are converted to uppercase ?

2019-08-04 Thread Giuseppe Lavagetto
On Sun, Aug 4, 2019 at 11:34 AM Nicolas Vervelle 
wrote:

> Thanks Brian,
>
> Great for the link to Php72ToUpper.php !
> I think I understand with it : for example, the first line says 'ƀ' => 'ƀ',
> which should mean that this letter shouldn't be converted to uppercase by
> MW ?
> That's one of the letter I found that wasn't converted to uppercase and
> that was generating a false positive in my code : so it's because specific
> MW code is preventing the conversion :-)
>

Hi!

No, that file is a temporary measure during a transition between two
versions of php.

In HHVM and PHP 5.x, calling mb_toupper("ƀ") would give the erroneous
result "ƀ".

In PHP 7.x, the result is the correct capitalization.

The issue is that the titles of wiki articles get normalized, so under php7
we would have

ƀar => Ƀar

which would prevent you from being able to reach the page.

Once we're done with the transition and we go through the process of
coverting the (several hundred) pages/users that have the wrong title
normalization, we will remove that table, and obtain the correct behaviour.

You just need to subscribe https://phabricator.wikimedia.org/T219279 and
wait for its resolution I think - most unicode horrors are fixed in recent
versions of PHP, including the one you were citing.

Cheers,

Giuseppe
-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] PHP 7 is now a beta feature

2019-01-28 Thread Giuseppe Lavagetto
Hi all,

as some of you might know, HHVM has decided some time ago to drop support
for PHP, choosing to only support Hack (Facebook's own PHP-derivative
language)[1].

This forced us to consider alternatives. In particular the last major
upgrade to PHP, PHP 7, was supposed to have greatly improved the
performance of the runtime, guaranteeing performance on par with HHVM.

Given that early tests[2] showed promising performance, we decided to work
on PHP7 support and on its rollout in production.

I'm happy to announce that PHP 7 is now available as a beta feature on all
wikis, and I encourage everyone to try it out and report bugs using the
#php7.2-support tag.

After this period of beta testing, we will proceed with a progressive
rollout to a growing percentage of users, and hopefully we'll complete the
transition in the next four months.

A huge thank you to all the people who worked hard to reach this goal!

Thanks,

Giuseppe
[1] https://hhvm.com/blog/2017/09/18/the-future-of-hhvm.html
[2]
https://lists.wikimedia.org/pipermail/wikitech-l/2017-September/088854.html
-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Gerrit now automatically adds reviewers

2019-01-17 Thread Giuseppe Lavagetto
On Thu, Jan 17, 2019 at 10:52 PM Greg Grossmeier  wrote:

> Hello,
>
> Yesterday we (the Release Engineering team) enabled a Gerrit plugin that
> will automatically add reviewers to your changes based on who previously
> has committed changes to the file.
>
>
While I commend the intention, this means I will get pinged for virtually
any change in a couple of very busy repositories.

The amount of noise will prevent me from being able to notice anyone's
review request. I think it's going to be the same for other developers - I
don't want to imagine what the inbox of a long-time mediawiki-core
contributor must look like!

What I fear is that the flood of reviews will make everyone just dull to
notifications, obtaining the exact opposite effect that was intended. I say
this because  I auto added myself to all reviews in operations/puppet[1] in
the past, which resulted in me ignoring all code review requests.

I think a good compromise would be to modify the plugin so that it adds
reviewers automatically, only if you're a new contributor (so you have -
say - less than N patches submitted).

While this gets improved, is there a way to opt-out from the feature
individually or as a project?

Thanks,
Giuseppe

[1] we already have a way to "monitor" all changes to a repository, to a
directory within a repository, or even to individual files, which I was
using extensively. Should we remove that?
-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wmfall] Datacenter Switchover recap

2018-09-14 Thread Giuseppe Lavagetto
Sorry for the copy/paste fail, I meant



> So I want to congratulate everyone who was involved in the process, that
> includes most of the people on the core platform, performance, search and
> SRE teams, but a special personal thanks goes to
> Alexandros and Riccardo for driving most of the process and allowing me to
> care about the switchover for less than a week before it happened and, yes,
> to take the time to fix that bug too :)
>
>
Cheers,

Giuseppe
-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wmfall] Datacenter Switchover recap

2018-09-14 Thread Giuseppe Lavagetto
On Thu, Sep 13, 2018 at 7:49 AM Bryan Davis  wrote:

>
> Everyone involved worked hard to make this happen, but I'd like to
> give a special shout out to Giuseppe Lavagetto for taking the time to
> follow up on a VisualEditor problem that affected Wikitech
> (<https://phabricator.wikimedia.org/T163438>). We noticed during the
> April 2017 switchover that the client side code for VE was failing to
> communicate with the backend component while the wikis were being
> served from the Dallas datacenter. We guessed that this was a
> configuration error of some sort, but did not take the time to debug
> in depth. When the issue reoccurred during the current datacenter
> switch, Giuseppe took a deep dive into the code and configuration,
> identified the configuration difference that triggered the problem,
> and made a patch for the Parsoid backend that fixes Wikitech.
>
>
While I'm flattered by the compliments, I think it's fair to underline the
problem was partly caused by a patch I made to Parsoid some time ago. So I
mostly cleaned up a problem I caused - does this count for getting a new
t-shirt, even if I fixed it with more than one year of delay? :P

On the other hand, I want to join the choir praising the work that has been
done for the switchover, and take the time to list all the things we've
done collectively to make it as uneventful and fast (read-only time was
less than 8 minutes this time) as it was:
- Mediawiki now fetches its read-only state and which datacenter is the
master from etcd, eliminating the need for a code deployment
- We now connect to our per-datacenter distributed cache via mcrouter,
which allows us to keep the caches in various datacenters consistent. This
eliminated the need to wipe the cache during the read-only phase, thus
resulting in a big reduction in the time we went to read-only
- Our old jobqueue not only gave me innumerable debugging nightmares, but
was hard and tricky to handle in a multi-datacenter environment. We have
substituted it with a more modern system which needed no intervention
during the switchover
- Our media storage system (Swift + thumbor) is now active-active and we
write and read from both datacenters
- We created a framework for easily automate complex orchestration tasks
(like a switchover) called "spicerack", which will benefit our operations
in general and has the potential to reduce the toil on the SRE team, while
proven, automated procedures can be coded for most events.
- Last but not least, the Dallas datacenter (codenamed "codfw") needed
little to no tuning when we moved all traffic, and we had to fix virtually
nothing that went out of sync during the last 1.4 years. I know this might
sound unimpressive, but keeping a datacenter that's not really used in good
shape and in sync is a huge accomplishment in itself; I've never seen
before such a show of flawless execution and collective discipline.

So I want to congratulate everyone who was involved in the process, that
includes most of the people on the core platform, performance, search and
SRE teams, but a special personal thanks goes to
- The whole SRE team, and really anyone working on our production
environment, for keeping the Dallas datacenter in good shape for more than
a year, so that we didn't need to adjust almost anything pre or
post-switchover Alexandros and Riccardo for driving most of the process and
allowing me to care about the switchover for less than a week before it
happened and, yes, to take the time to fix that bug too :)

Cheers,

Giuseppe
P.S. I'm sure I forgot someone / something amazing we've done; I apologize
in advance.
-- 
Giuseppe Lavagetto
Principal Site Reliability Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Phabricator spam - account approval requirement enabled

2018-06-30 Thread Giuseppe Lavagetto
On Sun, Jul 1, 2018 at 7:16 AM Niharika Kohli  wrote:

> On Sat, Jun 30, 2018 at 8:53 PM Greg Grossmeier 
> wrote:
>
>> Hello,
>>
>> Unfortunately we are experiencing spam in our Phabricator instance
>> again and have decided to turn on the requirement for new account
>> approval by Phabricator admins as a mitigation step.
>>
>
> I'd request that it please be kept on until we have some spam mitigation
> tools. At the very least easier revert actions.
>
>
Indeed.
We should *not* remove the approval process until a better anti-vandalism
system is available for phabricator.

Repairing the damage that has been done will require a ton of man-hours.

Cheers,
Giuseppe
-- 
Giuseppe Lavagetto
Senior Technical Operations Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Gerrit login oddities

2018-02-13 Thread Giuseppe Lavagetto
On Tue, Feb 13, 2018 at 5:56 AM, Chad  wrote:

> Hi,
>
> Two quick things:
>
> * First, we updated the cookie path for Gerrit to enable sharing
> authentication with the new Gitiles repo browser--for most users this has
> been transparent but a few people have reported problems with being able to
> login. If this happens, try clearing out any cookies you have for
> gerrit.wikimedia.org and try logging in anew.


Just to help people not lose their settings and other things - you just
need to remove the GerritAccount cookie for gerrit.wikimedia.org, then
you'll be able to login again.

Cheers

Giuseppe
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] HHVM vs. Zend divergence

2017-09-21 Thread Giuseppe Lavagetto
On Wed, Sep 20, 2017 at 10:56 PM, Brion Vibber <bvib...@wikimedia.org>
wrote:

> On Mon, Sep 18, 2017 at 1:58 PM, Max Semenik <maxsem.w...@gmail.com>
> wrote:
>
> >
> > 3) Revert WMF to Zend and forget about HHVM. This will result in
> > performance degradation, however it will not be that dramatic: when we
> > upgraded, we switched to HHVM from PHP 5.3 which was really outdated,
> while
> > 5.6 and 7 provided nice performance improvements.
> >
>
> Migrating WMF's implementation to PHP 7 is probably the way to go. I leave
> it up to ops to figure out how to make the change. :)
>


I think this is the more viable option too, mostly not to cause huge issues
to non-Wikimedia users. The ease of installation and operation of PHP on
most platforms compared to HHVM is incomparable. Just to make an example, I
don't remember having to check the code of the VM to understand what an ini
setting does with PHP (or even PHP-fpm), while with HHVM that has been a
recurring pain, as options come and go without any warning between
versions, not to mention the online docs that can even be plainly
misleading. At the moment, there is no doubt PHP is a much friendlier
environment for any third-party wiki.


But I think there is other value in going the PHP7 way; I had actually
pitched the idea of switching to PHP7 for some time now, for various
reasons, namely:
- We use HHVM differently than what Facebook does. For instance, we use the
fcgi server mode, and not the pure http one, and we don't run in repoAuth
mode. This has brought us new bugs to solve every time we upgrade, as there
is no battle testing in production for those code patterns at scale before
we use it.
- The recent, prolonged difficulty of interaction with the FLOSS community,
although acknowledged by the HHVM team as something they're willing to fix,
is worrying in itself.
- PHP7 has shown performance comparable to HHVM for most PHP shops that
migrated. So the single most compelling reason for which we migrated
(performance) might not be a factor anymore. Using a runtime readily
available (and security-patched) by the upstream distribution would make
the ops team lives easier as well.

As for the actual migration:

I don't think there is any need to panic or rush to a decision, but the
timeline is pretty set: by the end of 2018, when official support for HHVM
3.24 will end, any migration should be well underway within the WMF
infrastructure. I expect a migration from HHVM to PHP 7 to be a less
formidable undertaking than the switch from PHP 5.3 to HHVM  - we did repay
a good deal t of the tech debt in the Wikimedia Foundation installation
back then, and we won't have to change radically the way we serve
MediaWiki, as PHP 7 works as a FastCGI server as well. Still, it will take
time and resources, and it needs to be planned in advance.

One important consequence of the announcement for Wikimedia is that we
won't be able to use any version of MediaWiki not compatible with PHP 5.x
until we transition to PHP7 (unless we decide to support both HHVM and
PHP7). This might be important in steering the timing of the change in
MediaWiki itself.

Cheers,

Giuseppe

-- 
Giuseppe Lavagetto, Ph.d.
Senior Technical Operations Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Ops] [Engineering] The train will resume tomorrow (was Re: All wikis reverted to wmf.8 last night due to T119736)

2016-07-12 Thread Giuseppe Lavagetto
On Wed, Jul 13, 2016 at 2:15 AM, aude <aude.w...@gmail.com> wrote:
> On Tue, Jul 12, 2016 at 7:56 PM, Ori Livneh <o...@wikimedia.org> wrote:
>> Our failure to react to this swiftly and comprehensively is appalling and
>> embarrassing. It represents failure of process at multiple levels and a lack
>> of accountability.
>
>
> This (unbreak now) bug has been open since November.  I wonder how this has
> been allowed to remain open and not addressed for this long?
>

I am sure we could've done way better even in our current structure,
but it's pretty clear to me that the absence of a team dedicated to
MediaWiki itself calls for such things to happen.

Which is pretty absurd, when you remember that 99% of our traffic is
still served by it.

Cheers

G.
-- 
Giuseppe Lavagetto, Ph.d.
Senior Technical Operations Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Upgrading HHVM to libicu52

2016-05-25 Thread Giuseppe Lavagetto
Hi all,

tomorrow, Thursday May 25th, I will be upgrading our HHVM servers to
use a recent version of the ICU[1] library, a long-needed change that
we are finally ready to perform: it allows us to stop maintaining an
older version by ourselves, including having to patch it for any
security issue.

For details about the rationale and the long process involved, see

https://phabricator.wikimedia.org/T86096

While the upgrade should be smooth, the ICU maintainers do not
guarantee backward compatibility for collation, so to be sure that is
addressed, we will need to run a maintenance script on all wikis that
have $wgCategoryCollation set to anything including with 'uca', see

https://phabricator.wikimedia.org/diffusion/OMWC/browse/master/wmf-config/InitialiseSettings.php;2f61ae1bcffe0f7b8626d544a98eea3c4a7d7905$13676


Since this script takes quite a long time to run[2], there will be
some user-facing effect, during the transition period, namely, citing
what MatmaRex says on the ticket:

"After ICU is upgraded, but before the updateCollation script
finishes, articles newly added to categories may appear out-of-order
on category listing pages. The headings on them might be wrong in
funny ways, too. Nothing else should be affected."

If no last-minute showstopper blocks the process, I will be starting
the procedure around 8:00 UTC, and log in the SAL[3] every step of the
process. Don't hesitate to contact me on IRC (#wikimedia-operations on
freenode, user _joe_) if you see some strange behaviour.

Thanks in advance for your patience

Giuseppe

[1] ICU stands for International Components for Unicode
[2] It is actually much, much faster to run now than it ever was,
thanks to the amaizing work others have done to improve it, see
https://phabricator.wikimedia.org/T58041 and
https://phabricator.wikimedia.org/T130692
[3] https://wikitech.wikimedia.org/wiki/Server_Admin_Log
-- 
Giuseppe Lavagetto, Ph.d.
Senior Technical Operations Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Reducing the environmental impact of the Wikimedia movement

2016-05-16 Thread Giuseppe Lavagetto
On Thu, Mar 31, 2016 at 12:39 AM, Tim Starling <tstarl...@wikimedia.org> wrote:

> I think it's stretching the metaphor to call ops a "tight ship". We
> could switch off spare servers in codfw for a substantial power
> saving, in exchange for a ~10 minute penalty in failover time. But it
> would probably cost a week or two of engineer time to set up suitable
> automation for failover and periodic updates.
>

Just a small clarification: I don't think turning off and on
periodically servers would be a feasible option because servers (and
computers in general) tend to have a pretty high failure rate when
being powered off and on regularly. We see this with some server
failing every time we do a mass reboot due to some security issue. On
the other hand, we could surely do better in terms of idle-server
power consumption. In terms of costs and time spent (and probably also
natural resources consumption, but I did no calculation whatsoever) it
would probably be not sustainable.


> Or we could have avoided a hot spare colo altogether, with smarter
> disaster recovery plans, as I argued at the time.

Another small clarification: our codfw datacenter is _not_ just a hot
spare for disaster recovery and a lot of work has been done to make
the two facilities mostly active-active (and a lot more will be done
in the coming year).

Cheers,

Giuseppe
P.S. The server energy footprint of the WMF is negligible if compared
to the big internet players, but even a small-medium size local ISP
has probably a larger footprint than us. This doesn't mean we should
not try to get better, but we should always put things in prespective.
-- 
Giuseppe Lavagetto
Senior Technical Operations Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Services] [ANNOUNCEMENT] RESTBase and related services DC switch-over test

2016-04-04 Thread Giuseppe Lavagetto
On Mon, Mar 14, 2016 at 10:54 PM, Marko Obrovac <mobro...@wikimedia.org> wrote:
> Hello,
>
> The WMF’s technology department has for this quarter the goal of testing and
> temporarily switching the main operational data centre from Eqiad (located
> in Chicago) to Codfw (located in Dallas)~[1,2]. This includes both
> back-end-processing as well as serving live traffic from it.
>
> As a part of this effort, we are scheduling a switch-over for RESTBase and
> its back-end services, including: Parsoid, the Mobile Content Service,
> CXServer, Mathoid, Citoid, Apertium and Zotero~[3]. Technically, it will not
> be a real switch-over per se, because we will keep all of those services
> active in both DCs. However, external traffic will be directed to the Dallas
> DC only.
>

Hi all, just a quick heads-up:

given the small issues we experienced last time, which we've found to
be unrelated to the switch itself, we scheduled a new switch-over test
lasting 24 hours, which is scheduled to start tomorrow (April 5th) at
14:00 UTC. We don't expect any significant user impact.

Anyways, should you have any questions or concerns, don’t hesitate to
contact us here
or on IRC (#wikimedia-services / #wikimedia-operations @ freenode).

Cheers,

Giuseppe
-- 
Giuseppe Lavagetto, Ph.d.
Senior Technical Operations Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Announcing mediawiki-containers, a Docker-based MediaWiki installer

2016-01-30 Thread Giuseppe Lavagetto
On Sat, Jan 30, 2016 at 9:59 AM, Gabriel Wicke  wrote:
> Right now, Yuvi is evaluating the Kubernetes cluster manager in labs.

Just a clarification: Yuvi has already evaluated kubernetes and it's
being actively used to build an awesome replacement for at least part
of what toollabs does right now. A handful of tools are already
running, with success, on it for quite a long time.

> Its features include scheduling of "pods" (groups of containers) to
> hardware nodes, networking, rolling deploys and more. While all these
> features provide a very high degree of automation, they also mean that
> failures in Kubernetes can have grave consequences. I think operations
> are wise to wait for Kubernetes to mature a bit further before
> considering it for critical production use cases.
>

Failures in any complex system are surely scary, but kubernetes seems
stable enough to be evaluated for production use. We also had an
unconference session at the WMDS about this - or better what we want
to achieve by using kubernetes as a tool.

I will also stress that there are more "mature" cluster/container
framework like Apache Mesos/Aurora/Marathon, but after taking a hard
look at them me and Yuvi evaluated that kubernetes is way more
promising for any of our use cases.

This is still a bit further away in the future, anyways. There is
already a phabricator task for this, which is anyways sitting idle at
the moment as it's not in our immediate roadmap. The task is by the
way trying to be independent of the specific technology in describing
what we actually want to achieve. Kubernetes, as any other product we
might use, is just a mean to an end, and we should never be in love
with any specific technology.

https://phabricator.wikimedia.org/T122822

>  There is
> also some support to run docker images in systemd, which could be an
> alternative if we want to avoid the dependency on the docker runtime
> in production.

I guess you mean containers can run within systemd, but I don't think
just running containers instead of firejail would give us any
practical advantage at the moment from any operational prespective,
but I might miss the point.

> Lets get together and figure out a plan.

Let's do it! maybe next quarter when ops are not mostly focused on the
datacenter switch it will be easier, I guess :)

Cheers,

Giuseppe

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Port mw-vagrant to Raspberry Pi ( arm )

2015-09-30 Thread Giuseppe Lavagetto
On Tue, Sep 29, 2015 at 7:04 PM, Tony Thomas <01tonytho...@gmail.com> wrote:
> Hello,
>
[CUT]
>3.  hhvm is too ram hungry
>

If I'm not mistaken, hhvm won't compile on anything but an x86-64
architecture. So you definitely need to fall back to zend.

Cheers,

G.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fwd: [Tools] Kubernetes picked to provide alternative to GridEngine

2015-09-18 Thread Giuseppe Lavagetto
On Thu, Sep 17, 2015 at 4:00 PM, Brian Gerstle  wrote:
> Congrats on moving forward with a big decision!  I'm very optimistic about
> containers, so it's exciting to see movement in this area.
>
> Is there a larger arc of using this for our own services (Mediawiki,
> RESTBase, etc.), potentially in production?
>


Hi Brian,

I think we said this somewhere before, but yes we will consider if
kubernetes is a viable platform to run services in production onto.

But Kubernetes is much more than just "containers", it is a
distributed computing environment directly derived by Google's own
Borg system. I think it's a good candidate to be a platform to run
some services onto, but probably not mediawiki or Restbase for the
time being.

Cheers,

Giuseppe

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Evaluation of clustering solutions (continued)

2015-08-25 Thread Giuseppe Lavagetto
Hi all,

as previously announced, we've been evaluating a clustering solution
for use as an alternative to GridEngine for toollabs

https://lists.wikimedia.org/pipermail/wikitech-l/2015-August/082853.html

Our goal is also to find a suitable, modern, stable tool to run not
only toollabs webservices, but also - on a longer term - to find a
modern, easier, more convenient way to run our microservices in
production: a clusterized environment that will allow us to enhance
single service availalbility and also to apply easier scaling of
applications, reducing further the friction surface and the direct ops
involvement in the day-to-day setup and deployment of services.

Our evaluation of the available solutions is ongoing, and while we're
mostly done filling up an evaluation spreadsheet
(https://docs.google.com/spreadsheets/d/1YkVsd8Y5wBn9fvwVQmp9Sf8K9DZCqmyJ-ew-PAOb4R4/edit?usp=sharing),
we would welcome and we encourage further involvement/suggestions. You
can provide these easily on the tracking ticket for the evaluation,
https://phabricator.wikimedia.org/T106475

We received some interesting feedback already, and we look forward
incorporating more!

 We are considering two solutions - mesospheres' Marathon (which is
based on Mesos) - https://mesosphere.github.io/marathon/ and Google's
Kubernetes https://kubernetes.io.

Now let us summarize a bit our findings so far:
MESOS/MARATHON:

Pros:
- Mesos is stable and battle tested, although Marathon is
quite young and mostly used in mesosphere's commercial offering
- Supports overcommitting resources (which is important in
toollabs, probably less so in production)
- Has a nice, clean API and is fully distributed with no potential SPOFs
- Chronos is another framework that can run on mesos and is a
great distributed cron

Cons:
- Multitenancy story is non-existent, it was not designed to
be a public PaaS offering. This is an issue even in production if we
want to grant independence to single teams.
- Container support seems experimental at best.(but getting
better in newer versions)
- Adoption of Marathon seems little and the community is not
very lively.
- Discovery/scaling logic is somewhat limited

KUBERNETES

Pros:
- The design seems to be very well thought out, based off of
experiences running Google's internal Borg system (see
http://research.google.com/pubs/pub43438.html for details of Google's
Borg clustering system).
- A pretty refined security model is already implemented, so
that single users/teams could be given access to individual namespaces
and act independently
- The community is very lively, and adoption is gaining
momentum: kubernetes is the default way to deploy apps on Google
Compute Engine, it's used by Red Hat for its own cloud solution (and
they contribute patches to it), it has a clear roadmap to overcome
most of its limitations
- Container support is native and it's tecnology-agnostic,
allowing (for now) Docker and Rkt containers to be used
- The API is quite nice
- Documentation is decently complete
- Google engineers are actively supporting us in evaluating its usage
Cons:
- The master node is not highly available, although our
cluster survived a pretty serious outage in labs that froze the master
and wiped out one worker
- No overcommitting allowed, it will be possible to mimic it
with QoS (coming in the next version)
- The ability to schedule one-off jobs is offered, but there
is no distributed cron facility
- In general it's a younger project with some outstanding bugs

As you can see there are pretty big pros/cons for both these
technologies, due to the fact they are still quite not boring -
although one could argue that mesos and chronos at least have entered
their boring stage. Our spreadsheet slightly favours Kubernetes at
the moment, but that might change drastically, if we evaluate that
some limitations are absolute showstoppers for us.

In the remainder of this week and the next few ones, we will keep
stress testing both our test installations to find out surprises and
bugs.

Let us know what you think - or reach out to us if you want to help in
this evaluation process. We will keep you posted!

Cheers,

Giuseppe  Yuvi

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Search errors on multiple Wikimedia projects

2015-06-15 Thread Giuseppe Lavagetto
On Mon, Jun 15, 2015 at 3:28 PM, Chad innocentkil...@gmail.com wrote:
 On Mon, Jun 15, 2015 at 12:12 AM Pine W wiki.p...@gmail.com wrote:

 I'm getting this same error on multiple Wikimedia projects:

 An error has occurred while searching: Search is currently too busy.
 Please try again later.

 Help?


 The elasticsearch cluster exploded completely. It's mostly recovered
 now (just waiting for full redundancy) and searches should now be
 back on for users. Full incident report to follow I imagine.


You are correct. I'm not making promises on how soon I'll get to write
it, though. Also, we'd probably need to do some root cause
investigation (I think we do have a candidate, but I'd like for some
ElasticSearch expert to take a look more thoroughly).

I will update this thread with a link to the outage report as soon as
it's ready.

Cheers

Giuseppe

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Multimedia team?

2015-05-13 Thread Giuseppe Lavagetto
While I know that doesn't sound fancy or attractive, I think the
multimedia team should have as one of its focuses to help with
transitioning to HHVM the imagescalers. There seem to be a few issues
with that and some support that won't just come out of goodwill, but
as a team commitment, would greatly help with that.

Cheers,
Giuseppe

On Mon, May 11, 2015 at 4:33 PM, Brian Gerstle bgers...@wikimedia.org wrote:
 I'm also curious what our audio/video storage/transcoding/playback roadmap
 is.  IMO it's a pretty fundamental feature that isn't well supported in all
 the clients (especially mobile).  Could probably do some interesting audio
 stuff (e.g. narration in many languages) for visually impaired.

 On Mon, May 11, 2015 at 7:42 AM, Jean-Frédéric jeanfrederic.w...@gmail.com
 wrote:

 2015-05-11 10:29 GMT+01:00 Antoine Musso hashar+...@free.fr:

  On 11/05/15 02:18, Tim Starling wrote:
 
  On 10/05/15 07:06, Brian Wolff wrote:
 
  People have been talking about vr for a long time. I think there is
 more
  pressing concerns (e.g. video). I suspect VR will stay in the video
 game
  realm  or gimmick realm for a while yet
 
  Maybe VR is a gimmick, but VRML, or X3D as it is now called, could be
  a useful way to present 3D diagrams embedded in pages. Like SVG, we
  could use it with or without browser support.
 
 
  Hello,
 
  A potential use case for the encyclopedia, would be to display models of
  chemistry molecules. An example:
 
http://wiki.jmol.org/index.php/Jmol_MediaWiki_Extension
 

 See https://phabricator.wikimedia.org/project/profile/16/ and 
 https://phabricator.wikimedia.org/project/profile/804/

 --
 Jean-Frédéric
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




 --
 EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle
 IRC: bgerstle
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Tor proxy with blinded tokens

2015-03-10 Thread Giuseppe Lavagetto
Hi Chris,

I like the idea in general, in particular the fact that only
established editors can ask for the tokens. What I don't get is why
this proxy should be run by someone that is not the WMF, given - I
guess - it would be exposed as a TOR hidden service, which will mask
effectively the user IP from us, and will secure his communication
from snooping by exit node managers, and so on.

I guess the righteously traffic on such a proxy would be so low (as
getting a token is /not/ going to be automated/immediate even for
logged in users) that it could work without using up a lot of
resources.

Cheers,

Giuseppe

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] 503 errors in Phabricator

2015-03-06 Thread Giuseppe Lavagetto
Hi,

I'm using phabricator regularly this morning (including doing pretty
advanced searches) and I cannot reproduce the problem, but I am surely
no expert.

Is it still ongoing? Which urls in particular is giving you 503s?

Cheers

Giuseppe

On Fri, Mar 6, 2015 at 9:02 AM, Pine W wiki.p...@gmail.com wrote:
 I'm repeatedly getting 503 errors when attemping to search Phabricator. Can
 someone check into this please?

 Pine
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Investigating building an apps content service using RESTBase and Node.js

2015-02-05 Thread Giuseppe Lavagetto
On Wed, Feb 4, 2015 at 6:59 AM, Marko Obrovac mobro...@wikimedia.org wrote:
 On Tue, Feb 3, 2015 at 8:42 PM, Tim Starling tstarl...@wikimedia.org
 wrote:

 I don't really understand why you want it to be integrated with
 RESTBase. As far as I can tell (it is hard to pin these things down),
 RESTBase is a revision storage backend and possibly a public API for
 that backend.


 Actually, RESTBase's logic applies to the Mobile Apps case quite naturally.
 When a page is fetched and transformed, it can be stored so that consequent
 requests can simply retrieve the transformed document form storage.



Ok, so in this vision RESTBase is a caching layer for revisions

 I thought the idea of SOA was to separate concerns.
 Wouldn't monitoring, caching and authorization would be best done as a
 node.js library which RESTBase and other services use?


 Good point. Ideally, what we would need to do is provide the right tools to
 developers to create services, which can then be placed strategically
 around DCs (in cooperation with Ops, ofc). For v1, however, we plan to
 provide only logical separation (to a certain extent) via modules which can
 be dynamically loaded/unloaded from RESTBase. In return, RESTBase will
 provide them with routing, monitoring, caching and authorisation out of the
 box. The good point here is that this 'modularisation' eases the transition
 to a more-decomposed orchestration SOA model. Going in that direction,
 however, requires some prerequisites to be fulfilled, such as [1].


So, now RESTBase has become a router and an auth provider as well?
(Gabriel already clarified me that monitoring means that RESTbase
will expose its own metrics for that specific service, so this is not
monitoring of the service at all, rather accounting).

I need some clarification at this point - what is RESTBase really
going to do? I'm asking because when I read RESTBase will provide
them with routing, [...] and authorisation I immediately think of a
request router and a general on-wiki auth provider. And we already
have both, and re-doing them in RESTBase would be plainly wrong.

Maybe you intend very specific things when you say routing and not
request routing, which is what everybody here will think of.
And when you say auth you mean that RESTBase implements an auth
schema for its clients, so that no client can access data from another
one. If this is the case, I have some further questions: is this going
to be RBAC? Which permissions models are you implementing? Are we sure
it is what we will need? And foremost: will this be exposable to
external consumers? will it be able to hook up to our traditional wiki
auth scheme?

Can you please expand a bit on those concepts? Or a lot of confusion,
uncertainty and doubt will spread amongst your fellow engineers,
resulting in an hostile view of what may be a perfectly well designed
software.

Cheers,

Giuseppe

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Investigating building an apps content service using RESTBase and Node.js

2015-02-05 Thread Giuseppe Lavagetto
On Wed, Feb 4, 2015 at 6:59 AM, Marko Obrovac mobro...@wikimedia.org wrote:
 On Tue, Feb 3, 2015 at 8:42 PM, Tim Starling tstarl...@wikimedia.org
 wrote:

 I don't really understand why you want it to be integrated with
 RESTBase. As far as I can tell (it is hard to pin these things down),
 RESTBase is a revision storage backend and possibly a public API for
 that backend.


 Actually, RESTBase's logic applies to the Mobile Apps case quite naturally.
 When a page is fetched and transformed, it can be stored so that consequent
 requests can simply retrieve the transformed document form storage.



Ok, so in this vision RESTBase is a caching layer for revisions

 I thought the idea of SOA was to separate concerns.
 Wouldn't monitoring, caching and authorization would be best done as a
 node.js library which RESTBase and other services use?


 Good point. Ideally, what we would need to do is provide the right tools to
 developers to create services, which can then be placed strategically
 around DCs (in cooperation with Ops, ofc). For v1, however, we plan to
 provide only logical separation (to a certain extent) via modules which can
 be dynamically loaded/unloaded from RESTBase. In return, RESTBase will
 provide them with routing, monitoring, caching and authorisation out of the
 box. The good point here is that this 'modularisation' eases the transition
 to a more-decomposed orchestration SOA model. Going in that direction,
 however, requires some prerequisites to be fulfilled, such as [1].


So, now RESTBase has become a router and an auth provider as well?
(Gabriel already clarified me that monitoring means that RESTbase
will expose its own metrics for that specific service, so this is not
monitoring of the service at all, rather accounting).

I need some clarification at this point - what is RESTBase really
going to do? I'm asking because when I read RESTBase will provide
them with routing, [...] and authorisation I immediately think of a
request router and a general on-wiki auth provider. And we already
have both, and re-doing them in RESTBase would be plainly wrong.

Maybe you intend very specific things when you say routing and not
request routing, which is what everybody here will think of.
And when you say auth you mean that RESTBase implements an auth
schema for its clients, so that no client can access data from another
one. If this is the case, I have some further questions: is this going
to be RBAC? Which permissions models are you implementing? Are we sure
it is what we will need? And foremost: will this be exposable to
external consumers? will it be able to hook up to our traditional wiki
auth scheme?

Can you please expand a bit on those concepts? Or a lot of confusion,
uncertainty and doubt will spread amongst your fellow engineers,
resulting in an hostile view of what may be a perfectly well designed
software.

Cheers,

Giuseppe

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Microservices/SOA: let's continue the discussion

2015-02-04 Thread Giuseppe Lavagetto

Hi all,

it has been since the Dev Summit discussions on SOA/Microservices[1]  that I am 
pondering the outcomes and I am willing to post some afterthoughts to these 
lists. Having been one of the most vocal in raising concerns about 
microservices, and having had experience in an heavily service-oriented web 
platform before, I think I owe my fellow engineers some more lengthy 
explanations. Also, let me say that I am very happy with both the discussions 
we had in the Dev Summit and its outcomes - including the fact that the Ops and 
Services teams both share the desire to work strictly toghether on this.

I tried to write down some thoughts about this, and ended up with a way too 
long email. So I decided to put up a page on wikitech here:

https://wikitech.wikimedia.org/wiki/User:Giuseppe_Lavagetto/MicroServices


Apart from my blabbing, have three questions on our strategy: How, when, what? 
None of this is clear to me as of today, and I guess if anyone has a clear 
picture of where we want to be in 6-to-12 months with microservices. If someone 
has a clear plan, please speak up so that we can tackle the challenges ahead of 
us on a practical basis, and not just based on some grand principles :)


Cheers

Giuseppe

[1] I prefer the latter term, probably because SOA sounds bloated to me, and 
reminds me of enterprise software architectures that I don’t like.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Investigating building an apps content service using RESTBase and Node.js

2015-02-03 Thread Giuseppe Lavagetto
On Wed, Feb 4, 2015 at 5:42 AM, Tim Starling tstarl...@wikimedia.org wrote:
 On 04/02/15 12:46, Dan Garry wrote:
 To address these challenges, we are considering performing some or all of
 these tasks in a service developed by the Mobile Apps Team with help from
 Services. This service will hit the APIs we currently hit on the client,
 aggregate the content we need on the server side, perform transforms we're
 currently doing on the client on the server instead, and serve the full
 response to the user via RESTBase. In addition to providing a public API
 end point, RESTBase would help with common tasks like monitoring, caching
 and authorisation.

 I don't really understand why you want it to be integrated with
 RESTBase. As far as I can tell (it is hard to pin these things down),
 RESTBase is a revision storage backend and possibly a public API for
 that backend. I thought the idea of SOA was to separate concerns.
 Wouldn't monitoring, caching and authorization would be best done as a
 node.js library which RESTBase and other services use?


I agree with Tim. Using RESTBase as an integration layer for
everything is SOA done wrong. If we need to have an authorization
system, which is different from our APIs, we need to build it
separately, not to add levels of indirection.

Doing 4 things from one single service is basically rebuilding the
mediawiki monolith, only in a different language :)

What you need, IMO, is a thin proxy layer in front of all the separate
APIs you have to call, including restbase for caching/revision
storage. It may be built into the app or, if it is consumed by
multiple apps, built as a thin proxy service itself.

(I also don't get what monitoring means here, but someone could
probably explain it to me)

Cheers,

Giuseppe

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] All non-api traffic is now served by HHVM

2014-12-10 Thread Giuseppe Lavagetto
On 09/12/14 23:10, Brian Wolff wrote:
 
 Awesome.
 
 Any chance the video scalars could be put near the top of the list for
 servers to upgrade Ubuntu on? The really old version of libav on those
 servers is causing problems for people uploading videos in certain
 formats.
 

Since API and appservers are done, we're left with the jobrunners (for
which the conversion is already done), the imagescalers and the
videoscalers. We are working right now on the imagescaler conversion, it
will require some preparation work and some testing, but it won't take
too long hopefully.

Cheers,

Giuseppe
-- 
Giuseppe Lavagetto
Wikimedia Foundation - TechOps Team

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] All non-api traffic is now served by HHVM

2014-12-09 Thread Giuseppe Lavagetto
On 03/12/14 18:03, Giuseppe Lavagetto wrote:
 Hi all,
[CUT]
 The API traffic is still being partially served by mod_php, but
 that will not be for long!
 

As promised, all our API traffic is on HHVM as well as of now.

The effects on CPU usage have been quite drastic on this cluster,
where the load is higher:

http://bit.ly/1Abwwzi

Cheers,
Giuseppe
-- 
Giuseppe Lavagetto
Wikimedia Foundation - TechOps Team

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] All non-api traffic is now served by HHVM

2014-12-03 Thread Giuseppe Lavagetto
Hi all,

it's been quite a journey since we started working on HHVM, and last
week (November 25th) HHVM was finally introduced to all users who didn't
opt-in to the beta feature.

Starting on monday, we started reinstalling all the 150 remaining
servers that were running Zend's mod_php, upgrading them from Ubuntu
precise to Ubuntu trusty in the process. It seemed like an enormous task
that would require me weeks to complete, even with the improved
automation we built lately.

Thanks to the incredible work by Yuvi and Alex, who helped me basically
around the clock,  today around 16:00 UTC we removed the last of the
mod_php servers from our application server pool: all the non-API
traffic is now being served by HHVM.

This new PHP runtime has already halved our backend latency and page
save times, and it has also reduced significantly the load on our
cluster (as I write this email, the average cpu load on the application
servers is around 16%, while it was easily above 50% in the pre-HHVM era).

The API traffic is still being partially served by mod_php, but that
will not be for long!

Cheers,

Giuseppe
-- 
Giuseppe Lavagetto
Wikimedia Foundation - TechOps Team

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Tor and Anonymous Users (I know, we've had this discussion a million times)

2014-10-01 Thread Giuseppe Lavagetto
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 30/09/14 23:02, Marc A. Pelletier wrote:
 On 09/30/2014 09:08 AM, Derric Atzrott wrote:
 [H]ow can we quantify the loss to Wikipedia, and to society at
 large, from turning away anonymous contributors? Wikipedians say
 'we have to blacklist all these IP addresses because of trolls'
 and 'Wikipedia is rotting because nobody wants to edit it
 anymore' in the same breath, and we believe these points are
 related.
 
 I've been doing adminwork on enwiki since 2007 and I can tell give
 you two anecdotal data points:
 
 (a) Previously unknown TOR endpoints get found out because they 
 invariably are the source of vandalism and/or spam.
 
 (b) I have never seen a good edit from a TOR endpoint.  Ever.
 
 A third one I can add since I have held checkuser (2009):
 
 (c) I have never seen accounts created via TOR or that edited
 through TOR that weren't demonstrably block evasion, vandalism or
 (most often) spamming.
 
 None of this is TOR-specific, the same observations apply to open 
 proxies in general, and the almost totality of hosted servers.
 Long blocks of open proxies or co-lo ranges that time out after
 *years* being blocked invariably start spewing spam and vandalism,
 often the very day the block expired.
 

Hi Marc :)

I know I don't need to convince you that TOR is a good thing in general.

Still, I don't see how the abusive nature of what is being done via
TOR makes it less valuable to our community, in particular in the
post-Snowden era. Without involving countries where freedom of speech
is not legally granted, it is reasonable to assume someone doing an
edit that may look 'unfriendly' to the US or UK governments will feel
uncomfortable doing that without TOR.

If, as it seems right now, the problem is technical (weed out the bots
and vandals) rather than ideological (as we allow anonymous
contributions after all) we can find a way to allow people to edit any
wikipedia via TOR while minimizing the amount of vandalism allowed.

Of course, let's not kid ourselves - it will require some special
measures probably, and editing via TOR would probably end up not being
as easy as editing via a public-facing IP (we may e.g. restrict
publishing via TOR to users that have logged in and have done 5 good
edits reviewed by others, or we can use modern bot-detecting
techniques in that case - those are just ideas).

Cheers,
Giuseppe
- -- 
Giuseppe Lavagetto
Wikimedia Foundation - TechOps Team
-BEGIN PGP SIGNATURE-
Version: GnuPG v1

iEYEARECAAYFAlQrq84ACgkQTwZ0G8La7IAWLgCglkaCutKP64khUn4zXpSsFnlD
HMkAoL4HoAw7Rx4PoGvqo0D5lDKOBawd
=RIjq
-END PGP SIGNATURE-

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l