Re: [Wikitech-l] What type of language is the wikitext ?

2017-07-05 Thread Gabriel Wicke
Here is an older blog post describing some of the issues in parsing
wikitext, including examples:
https://blog.wikimedia.org/2013/03/04/parsoid-how-wikipedia-catches-up-with-the-web/

On Wed, Jul 5, 2017 at 6:31 AM, יגאל חיטרון <khit...@post.bgu.ac.il> wrote:

> Hi. Any computer language that has parentheses isn't context free. So if
> you are not in Assembly, it's enough.
> About context-free - it's tricky question. I wouldn't call it context free
> because of some macro expansions, for example, the pipe ([[abc (def)|]] ->
> [[abc (def)|abc]]). The problem I don't know if it will really disturb you,
> because highlighting should not care about this.
> The first problem of context-free, templates, should not bother you either,
> because you do not have template expansion on the page is highlighted, only
> in "runtime". Even "subst" mechanism does not work before saving.
> I tried now to find some wikitext syntax constructs that created a "really"
> context sensitive problems, as text power (find ww for some w), but had not
> found anything.
> Any other oppinions?
> Igal (User:IKhitron)
>
>
> 2017-07-05 16:10 GMT+03:00 Kaartic Sivaraam <kaarticsivaraam91196@gmail.
> com>
> :
>
> > Dear all,
> >
> > Quoting from my previous post,
> >
> >“Currently the syntax highlighter of the Wikipedia android app seems
> >to be slow except on high-end devices. It has been proposed to
> >change the implementation to provide users with a better
> >(streamlined) experience while editing[1].”
> >
> >
> > I recently came to know from a reply to that post [2], that wikitext is
> > not a "regular language"[3]. I wanted to know what kind of language
> > wikitext is to ensure that the algorithm for syntax highlighting does the
> > right work. Is wikitext a "Context Free Language" or is it something
> else?
> >
> >
> > Links
> >  -
> > [1]: https://phabricator.wikimedia.org/T164936
> > [2]: https://lists.wikimedia.org/pipermail/mediawiki-l/2017-June/
> > 046627.html
> > [3]: https://en.wikipedia.org/wiki/Regular_language
> >
> > ---
> > Regards,
> > Kaartic
> >
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [AI] ORES service is having trouble

2017-06-13 Thread Gabriel Wicke
I checked the pdfrender instances on the other SCB nodes, and some used 7G
and 15G respectively. My patch limits this to 2G, which should be enough
for normal operation.

Memory on the SCB nodes has been tight for a while. I think there are plans
to move OCG (which uses the vast bulk of memory) to a separate cluster.
There is also https://phabricator.wikimedia.org/T146664 for limiting the
memory used by ORES.

On Tue, Jun 13, 2017 at 3:29 PM, Daniel Zahn <dz...@wikimedia.org> wrote:

>- 19:24 mutante: scb1001 - killed process 10971 (pdfrendering/electron)
>
> This apparently fixed it.
>
> I see there is already this https://gerrit.wikimedia.org/r/35 now
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Setting up multiple Parsoid servers behind load balancer

2017-06-09 Thread Gabriel Wicke
On Fri, Jun 9, 2017 at 8:25 AM, James HK <jamesin.hongkon...@gmail.com>
wrote:

> This sounds like a lot of sublayers that can potentially disrupt a
> simple editing process and I wonder from the many non-WMF MediaWiki
> installations and administrators, who will be able and capable to
> debug those once an issue arise.
>

This is a familiar pattern in the history of computers. Early computers
were programmed in assembly, until complexity was added with compilers.
Early wikis were simple Perl CGI scripts backed by files, until Wikipedia's
scale (traffic and organizational), security and feature requirements made
it necessary to add caching layers, isolated services, and distributed
storage systems.

Each of these steps added layers of abstraction and complexity, and
concerns about understanding all those layers was (rightfully) brought up
at each step along the way. And yet the move towards higher levels of
abstraction has been highly successful. Complex systems like web browsers
or even entire distributed system clusters can now be deployed with a
single click, on largely commoditized platforms.

We are not yet at the point where we can offer you this degree of
automation for MediaWiki, but we are working on it.
-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Setting up multiple Parsoid servers behind load balancer

2017-06-09 Thread Gabriel Wicke
On Fri, Jun 9, 2017 at 12:56 AM, Alexandros Kosiaris <
akosia...@wikimedia.org> wrote:
>
> I also don't think you need RESTBase as long as you are willing to
> wait for parsoid to finish parsing and returning the result.


Apart from performance, there is also functionality that is missing without
RESTBase:

   - Diffs are going to contain a lot of extra changes (commonly called
   "dirty diffs"), as no original HTML or data-parsoid is available to
   Parsoid's selective serialization algorithm. This might make it difficult
   to review changes.
   - Switching between wikitext and visual editing won't work.
   - Visual editing in general will very likely stop working once we reduce
   the size of HTML by separating out metadata (see
   https://phabricator.wikimedia.org/T78676). We keep pushing this back due
   to a lack of resources, but it is still planned, and might happen within
   the next six months.

In short, using Parsoid directly for visual editing is an unsupported
configuration, and is likely to stop working altogether in the foreseeable
future.


On Thu, Jun 8, 2017 at 7:10 AM, James Montalvo 
 wrote:

> I've read through the documentation I think you're talking about. It's kind
> of hard to determine where to start since the docs are spread out between
> multiple VE, Parsoid and RESTBase pages. Installing RESTBase is, as you
> say, straightforward (git clone, npm install, basically). Configuring is
> not clear to me, and without clear docs it's the kind of thing that takes
> hours of trial and error.


The RESTBase install instructions
 point to a fairly
well-commented example documentation file:
https://github.com/wikimedia/restbase/blob/master/config.example.yaml

For a basic install, all you should need is adjust the lines marked with
XXX in there. The default backend will use SQLite. Cassandra offers better
scalability and distribution for large scale, but this is not likely
something you need. A single SQLite-backed RESTBase instance and a single
Parsoid instance should be all you need.

We are aware of the complexity of setting up a fully featured MediaWiki
system, and are working on a Kubernetes-based solution right now (see
https://github.com/wikimedia/mediawiki-containers/blob/k8s/README.k8s.md
for current work in progress). The early prototype already sets up
MediaWiki, VisualEditor, RESTBase, Parsoid, Math, as well as other services
like EventBus. The current work is primarily aimed at development and
testing, but we expect it to also offer a quick way to spin up a complete &
fully-featured containerized MediaWiki system for small installs.

Hope this helps,

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Wikimedia REST API hits v1.0

2017-04-06 Thread Gabriel Wicke
It is official: The Wikimedia REST API
<https://www.mediawiki.org/wiki/REST_API>, your scalable and fresh source
of Wikimedia content and data in machine-readable formats, is now ready for
full production use. The 1.0 release means that you can now fully rely on
the stability guarantees set out in the API versioning policy
<https://www.mediawiki.org/wiki/API_versioning>. Read more about the
stability levels, use cases, as well as technical background on how the
REST API integrates with our caching layers in our blog post:


https://blog.wikimedia.org/2017/04/06/wikimedia-rest-api/


We are looking forward to your feedback at
https://www.mediawiki.org/wiki/Talk:REST_API, or here on-list.


This release was made possible by the hard work of many. First of all,
the Services
team <https://www.mediawiki.org/wiki/Wikimedia_Services> (Marko Obrovac,
Petr Pchelko and Eric Evans), created the general API proxy and storage
functionality, and curated the API documentation
<https://en.wikipedia.org/api/rest_v1/>. The actual end points are
co-designed with, and largely backed, by services developed by the
following WMF teams: Editing (Parsing
<https://www.mediawiki.org/wiki/Parsing> and citoid
<https://www.mediawiki.org/wiki/Citoid>), Reading
<https://www.mediawiki.org/wiki/Reading> (Infrastructure
<https://www.mediawiki.org/wiki/Wikimedia_Reading_Infrastructure_team> and
Web <https://www.mediawiki.org/wiki/Reading/Web>), and Analytics
<https://www.mediawiki.org/wiki/Analytics>. Volunteer Moritz Schubotz and
the MathJax <https://www.mathjax.org/> community contributed the math end
points, and the PDF end point is powered by the open source
electron-render-service <https://github.com/msokk/electron-render-service>
project. Finally, the WMF techops team
<https://www.mediawiki.org/wiki/Wikimedia_Technical_Operations> runs the
excellent caching infrastructure that makes this API scale so well, and
have helped with many aspects from hardware procurement to firewalling.


Thank you all for your hard work!


We are looking forward to continuing to work with you all on making this
API an even better platform for building user experiences, services, and
tools.


Cheers,


Gabriel and the Services team


-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] ArchCom Monutes, News, and Outlook

2017-03-17 Thread Gabriel Wicke
The discussion notes are now also available on-wiki at
https://www.mediawiki.org/wiki/Reading/Web/Projects/OCCAM/ArchCom-03-15-2017
.

Technical note: I pasted the contents into
https://gwicke.github.io/paste2wiki/, which is using the REST API and
Parsoid to convert HTML to wikitext. The biggest difference to pasting
straight into VisualEditor is that it preserves inline HTML links. See
https://phabricator.wikimedia.org/T129546 for a discussion about optionally
allowing pasting of links into VE as well.

On Thu, Mar 16, 2017 at 8:15 AM, Jan Dittrich <jan.dittr...@wikimedia.de>
wrote:

> > Doesn't etherpad (https://etherpad.wikimedia.org/) fit that need
> > without being proprietary?
>
> I found that I miss the comment feature, but it can be plugged in into
> etherpad.
>
> Jan
>
> 2017-03-16 14:21 GMT+01:00 Brad Jorsch (Anomie) <bjor...@wikimedia.org>:
>
> > On Thu, Mar 16, 2017 at 6:40 AM, Dan Garry <dga...@wikimedia.org> wrote:
> >
> > > Google Docs is easier to spin up in the moment and edit collaboratively
> > > than MediaWiki. Using proprietary software tools if they're a better
> fit
> > > for the intended purpose is entirely consistent with the Foundation's
> > > guiding principles
> > >
> >
> > Doesn't etherpad (https://etherpad.wikimedia.org/) fit that need without
> > being proprietary?
> >
> > --
> > Brad Jorsch (Anomie)
> > Senior Software Engineer
> > Wikimedia Foundation
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> >
>
>
>
> --
> Jan Dittrich
> UX Design/ User Research
>
> Wikimedia Deutschland e.V. | Tempelhofer Ufer 23-24 | 10963 Berlin
> Phone: +49 (0)30 219 158 26-0
> http://wikimedia.de
>
> Imagine a world, in which every single human being can freely share in the
> sum of all knowledge. That‘s our commitment.
>
> Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e. V.
> Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg unter
> der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt für
> Körperschaften I Berlin, Steuernummer 27/029/42207.
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] ArchCom Monutes, News, and Outlook

2017-03-15 Thread Gabriel Wicke
Thanks for everyone who participated in the discussion. Unfortunately, we
ran into a technical issue with setting up the youtube stream that we
weren't able to resolve quickly (my apologies to those unable to follow the
stream), but we did take detailed notes

that
you can peruse and comment on.

As a next step, the Reading team will do some more research & document more
specifics on requirements and solutions. Following this, we will have
another round of discussion.

Thanks,

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] ArchCom Monutes, News, and Outlook

2017-03-15 Thread Gabriel Wicke
Reminder: this is about to start in a couple of minutes.


   - What: High level mobile frontend requirements & plans
  - Agenda / discussion notes
  
<https://docs.google.com/document/d/1jlBl_qAIrGPF7zqOmK77Db8y4InO1j3qX1qqipzJNmc/edit#>
   - When: March 15, 2-3pm PDT (San Francisco)
   - Where:
  - Stream: http://youtu.be/8W7WrTa3Py4
  - Hangout (25 active participants max): https://hangouts.google.com
  /hangouts/_/ytl/T7sMtE_gUxWZ4biKxPh5ffreSnwnrIj1L7udZWXlKSk
  
<https://hangouts.google.com/hangouts/_/ytl/T7sMtE_gUxWZ4biKxPh5ffreSnwnrIj1L7udZWXlKSk>


On Tue, Mar 14, 2017 at 11:49 AM, Adam Baso <ab...@wikimedia.org> wrote:

> The doc is now public read.
>
> https://docs.google.com/document/d/1id-E_KELGGA3X5H4K44I6zIX3SEgZ0sF_
> biOY4INCqM
>
>
> On Fri, Mar 10, 2017 at 4:18 PM, Adam Baso <ab...@wikimedia.org> wrote:
>
> > I made a request to the document author for that, I imagine should be
> > available next week. Nothing secret in there, though.
> >
> > On Fri, Mar 10, 2017 at 4:16 PM, Legoktm <legoktm.wikipe...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> On 03/09/2017 08:17 AM, Daniel Kinzler wrote:
> >> > Next week’s RFC meeting (tentative, pending confirmation):
> >> > * explore High - Level Mobilefrontend Requirements (JavaScript
> >> frameworks,
> >> > Progressive Apps, and all that jazz)
> >> > <https://docs.google.com/document/d/1id-E_KELGGA3X5H4K44I6zI
> >> X3SEgZ0sF_biOY4INCqM/edit#heading=h.xs2aq4j4wzse>.
> >>
> >> I didn't try to open the link until now, but it requires a Google
> >> account to view, and is only visible to those in the WMF - could it be
> >> moved to mediawiki.org please?
> >
> >
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] ArchCom Monutes, News, and Outlook

2017-03-10 Thread Gabriel Wicke
On Fri, Mar 10, 2017 at 12:06 PM, Legoktm 
wrote:

> Hi,
>
> On 03/09/2017 08:17 AM, Daniel Kinzler wrote:
> > * NOTE: we plan to experiment with having a public HANGOUT meeting,
> instead of
> > using IRC.
>
> Can I ask why?


With audio and video, hangouts provide a somewhat higher bandwidth, and
avoid the problem of many people discussing several topics at the same time
that larger IRC meetings frequently run into.


> At least for me, Google Hangouts simply isn't an option
> to participate when I'm in a crowded library/classroom.


You can still listen in & ask questions via the hangout chat, or the
regular office IRC channel. Adam volunteered to monitor the channel, so
that questions can be addressed. The meeting will also be recorded at the
youtube link I provided, so you can catch up later, and ask follow-up
questions by mail.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] ArchCom Monutes, News, and Outlook

2017-03-10 Thread Gabriel Wicke
On Thu, Mar 9, 2017 at 8:17 AM, Daniel Kinzler 
wrote:
>
> Next week’s RFC meeting (tentative, pending confirmation):
> * explore High - Level Mobilefrontend Requirements (JavaScript frameworks,
> Progressive Apps, and all that jazz)
>  biOY4INCqM/edit#heading=h.xs2aq4j4wzse>.
> * NOTE: we plan to experiment with having a public HANGOUT meeting,
> instead of
> using IRC.
>

This is now confirmed.

   - What: High level mobile frontend requirements & plans
   - When: March 15, 2-3pm PDT (San Francisco)
   - Where:
  - Stream: http://youtu.be/8W7WrTa3Py4
  - Hangout (25 active participants max):
  
https://hangouts.google.com/hangouts/_/ytl/T7sMtE_gUxWZ4biKxPh5ffreSnwnrIj1L7udZWXlKSk
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] (Belated) ArchCom meeting notes 2017-02-15

2017-02-21 Thread Gabriel Wicke
On Tue, Feb 21, 2017 at 1:15 PM, Samuele Mantani <samuelemant...@gmail.com>
wrote:

> Interesting. A question but in short the Rest API would be?
>

Are you asking about the location of the REST API? If so, it is available
for each domain at /api/rest_v1/. For English Wikipedia, see
https://en.wikipedia.org/api/rest_v1/.


>
> Samuele2002
>
> Il 21/Feb/2017 10:06 PM, "Gabriel Wicke" <gwi...@wikimedia.org> ha
> scritto:
>
> > Hi, here is a brief summary from last week's ArchCom meeting. Apologies
> for
> > the delay.
> > Internal discussions
> >
> >-
> >
> >Internal plans for cleaning up a provisional ArchCom charter.
> >-
> >
> >How we manage the RfC board.
> >
> > RFC activity
> >
> >-
> >
> >Entering Final Comment Period: T122942: Support language variants in
> the
> >REST API <https://phabricator.wikimedia.org/T122942>
> >-
> >
> >   Consensus has formed around using Accept-Language headers for now,
> >   leaving path prefixes as a future addition.
> >   -
> >
> >   Based on the discussion on the task, the ArchCom will either make a
> >   decision or allow for more discussion in its meeting on
> > Wednesday, February
> >   22nd.
> >   -
> >
> >    T66214: Define an official thumb API
> ><https://phabricator.wikimedia.org/T66214>: Needs an update to the
> >summary to reflect the recent discussion, should be ready for last
> call
> >then.
> >
> >
> > --
> > Gabriel Wicke
> > Principal Engineer, Wikimedia Foundation
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] (Belated) ArchCom meeting notes 2017-02-15

2017-02-21 Thread Gabriel Wicke
Hi, here is a brief summary from last week's ArchCom meeting. Apologies for
the delay.
Internal discussions

   -

   Internal plans for cleaning up a provisional ArchCom charter.
   -

   How we manage the RfC board.

RFC activity

   -

   Entering Final Comment Period: T122942: Support language variants in the
   REST API <https://phabricator.wikimedia.org/T122942>
   -

  Consensus has formed around using Accept-Language headers for now,
  leaving path prefixes as a future addition.
  -

  Based on the discussion on the task, the ArchCom will either make a
  decision or allow for more discussion in its meeting on
Wednesday, February
  22nd.
  -

   T66214: Define an official thumb API
   <https://phabricator.wikimedia.org/T66214>: Needs an update to the
   summary to reflect the recent discussion, should be ready for last call
   then.


-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Discussion Platform

2016-12-06 Thread Gabriel Wicke
Some of us have been using matrix.org, which is an open system with many of
the features expected from a modern chat system. It also bridges to IRC,
but usability and reliability of that bridging can still be improved. See
https://meta.wikimedia.org/wiki/User:GWicke/Matrix.org for instructions.

On Tue, Dec 6, 2016 at 6:13 AM, Brad Jorsch (Anomie) <bjor...@wikimedia.org>
wrote:

> On Tue, Dec 6, 2016 at 7:50 AM, MAYANK JINDAL <mayank.jind...@gmail.com>
> wrote:
>
> > We are using IRC for discussion purpose. How will it be if we change our
> > discussion platform?
> > Many organizations have switched to gitter that have very user-friendly
> UI
> > and very easy to use.
> > Please give a view on my proposal.
> >
>
> It seems very unlikely that we would gain much by moving from an
> established open standard to a proprietary walled garden service.
>
> --
> Brad Jorsch (Anomie)
> Senior Software Engineer
> Wikimedia Foundation
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] 2016W43 ArchCom-RFC meeting: Allow HTML in SVG?

2016-10-25 Thread Gabriel Wicke
See also https://phabricator.wikimedia.org/T96461, which discusses using
https://github.com/cure53/DOMPurify, and Parsoid's Token-based sanitizer.

On Tue, Oct 25, 2016 at 6:12 PM, Legoktm <legoktm.wikipe...@gmail.com>
wrote:

> Hi,
>
> On 10/25/2016 03:14 PM, Rob Lanphier wrote:
> > 3.  Should we turn our SVG validation code into a proper library?
>
> Yes! This is <https://phabricator.wikimedia.org/T86874>. :)
>
> -- Legoktm
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RESTBase multiple pages with one request

2016-08-04 Thread Gabriel Wicke
Toni, we heavily use caching to speed up the REST API, so making individual
requests is the fastest way to retrieve content. You can use parallelism to
achieve your desired throughput, and with HTTP/2 all those parallel
requests can even share a single TCP connection. The request limit for the
API overall is 200 req/s, as documented in
https://en.wikipedia.org/api/rest_v1/?doc.

Hope this helps,

Gabriel

On Thu, Aug 4, 2016 at 2:20 PM, Jaime Crespo <jcre...@wikimedia.org> wrote:

> Sorry, I am not sure 100%, if that is true, maybe creating a feature
> request may help suggesting its implementation?
>
> On Thu, Aug 4, 2016 at 3:09 PM, Toni Hermoso Pulido <toni...@cau.cat>
> wrote:
> > Thanks Jaime, so it only works with Action (MediaWiki default) API so
> > far, doesn't it?
> >
> > El 08/04/2016 a les 10:07 AM, Jaime Crespo ha escrit:
> >> Hi, you can combine multiple pages with the "pipe" sign:
> >>
> >> Check:
> >> <https://en.wikipedia.org/w/api.php?action=query=
> revisions=content=jsonfm=Hillary_Clinton|Donald_Trump
> >
> >> (change 'jsonfm' for 'json' on a real request)
> >> There is a limit on the number of pages depending on your account
> >> rights, but it is very helpful to avoid round-trip latencies for us in
> >> high-latency places.
> >>
> >>
> >> On Thu, Aug 4, 2016 at 9:34 AM, Toni Hermoso Pulido <toni...@cau.cat>
> wrote:
> >>> Hello,
> >>>
> >>> is it already possible to retrieve data from different pages just by
> >>> using one request?
> >>>
> >>> E.g by combining:
> >>> https://en.wikipedia.org/api/rest_v1/page/summary/Electron
> >>> and
> >>> https://en.wikipedia.org/api/rest_v1/page/summary/Dog
> >>>
> >>>
> >
> > --
> > Toni Hermoso Pulido
> > http://www.cau.cat
> > http://www.similis.cc
> >
> >
> > _______
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Jaime Crespo
> <http://wikimedia.org>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Loosing the history of our projects to bitrot. Was: Acquiring list of templates including external links

2016-08-01 Thread Gabriel Wicke
> One possibility is considering storing rendered HTML for old revisions. It
> lets wikitext (and hence parser) evolve without breaking old revisions.
Plus
> rendered HTML will use the template revision at the time it was rendered
vs.
> the latest revision (this is the problem Memento tries to solve).

Long term HTML archival is a something we have been gradually working
towards with RESTBase.

Since HTML is about 10x larger than wikitext, a major concern is storage
cost. Old estimates <https://phabricator.wikimedia.org/T97710> put the
total storage needed to store one HTML copy of each revision at roughly
120T. To reduce this cost, we have since implemented several improvements
<https://phabricator.wikimedia.org/T93751>:


   - Brotli compression <https://en.wikipedia.org/wiki/Brotli>, once
   deployed, is expected to reduce the total storage needs to about
   1/4-1/5x over gzip <https://phabricator.wikimedia.org/T122028#2004953>.
   - The ability to split latest revisions from old revision lets us use
   cheaper and slower storage for old revisions.
   - Retention policies let us specify how many renders per revision we
   want to archive. We currently only archive one (the latest) render per
   revision, but have the option to store one render per $time_unit. This is
   especially important for pages like [[Main Page]], which are rarely edited,
   but constantly change their content in meaningful ways via templates. It is
   currently not possible to reliably cite such pages, without resorting to
   external services like archive.org.


Another important requirement for making HTML a useful long-term archival
medium is to establish a clear standard for HTML structures used. The
versioned Parsoid HTML spec
<https://www.mediawiki.org/wiki/Specs/HTML/1.2.1>, along with format
migration logic for old content, are designed to make the stored HTML as
future-proof as possible.

While we currently only have space for a few months worth of HTML
revisions, we do expect the changes above to make it possible to push this
to years in the foreseeable future without unreasonable hardware needs.
This means that we can start building up an archive of our content in a
format that is not tied to the software.

Faithfully re-rendering old revisions is harder in retrospect. We will
likely have to make some trade-offs between fidelity & effort.

Gabriel


On Mon, Aug 1, 2016 at 2:01 PM, David Gerard <dger...@gmail.com> wrote:

> On 1 August 2016 at 17:37, Marc-Andre <m...@uberbox.org> wrote:
>
> > We need to find a long-term view to a solution.  I don't mean just
> keeping
> > old versions of the software around - that would be of limited help.
> It's
> > be an interesting nightmare to try to run early versions of phase3
> nowadays,
> > and probably require managing to make a very very old distro work and
> > finding the right versions of an ancient apache and PHP.  Even *building*
> > those might end up being a challenge... when is the last time you saw a
> > working egcs install? I shudder how nigh-impossible the task might be 100
> > years from now.
>
>
> oh god yes. I'm having this now, trying to revive an old Slash
> installation. I'm not sure I could even reconstruct a box to run it
> without compiling half of CPAN circa 2002 from source.
>
> Suggestion: set up a copy of WMF's setup on a VM (or two or three),
> save that VM and bundle it off to the Internet Archive as a dated
> archive resource. Do this regularly.
>
>
> - d.
>
> ___________
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Shutting down deprecated rest.wikimedia.org by September 1st

2016-07-22 Thread Gabriel Wicke
We are going to finally switch off the deprecated domain rest.wikimedia.org
by September 1st. This should not affect any REST API users, as this domain
has been officially deprecated since January & sunset since April [1].
Since then, requests to that domain have returned an error informing users
about the move.

Access to the REST API is exclusively through the main project domains,
following the following pattern:

  https://en.wikipedia.org/api/rest_v1/?doc

Thank you for your cooperation,

-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

[1]: https://lists.wikimedia.org/pipermail/wikitech-l/2016-April/085309.html
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] 2016-06-01 Scrum of Scrums meeting notes

2016-06-03 Thread Gabriel Wicke
On Fri, Jun 3, 2016 at 10:21 AM, James Forrester
 wrote:
> On 3 June 2016 at 00:14, Pine W  wrote:
>
>> I'd like to request a clarification about RESTBase.
>>
>> These notes say:
>> "* RESTBase
>> ** enforcing rate limits as of today
>> *** pageview: 10 req/s
>> *** transforms: 5 req/s"
>>
>> Can you expand on what kinds of requests these rate limits are limiting?

Small correction: The "pageview" reference here is to the page view
API end points:

- 
https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_per_article_project_access_agent_article_granularity_start_end
- 
https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_aggregate_project_access_agent_granularity_start_end
- 
https://wikimedia.org/api/rest_v1/?doc#!/Pageviews_data/get_metrics_pageviews_top_project_access_year_month_day

See https://phabricator.wikimedia.org/T135240 for background on these
limits on the pageview API. tl;dr: The backend for pageview data has
limited traffic capacity right now, but the analytics team is working
on a hardware upgrade. In the meantime, we need to make sure that the
limited capacity is used fairly, by enforcing conservative per-client
limits.

> Unless you're running a tool against
> this API at a very high speed, ignoring the Terms of Use, it won't affect
> you.

Indeed. For the (relatively expensive) transform end points, we picked
the limits so that none of the current users are actually affected by
them.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Reviving SVG client-side rendering task

2016-05-13 Thread Gabriel Wicke
Another option might be to piggy-back on the current work towards
lazy-loaded images [1]. Since this is using JS, it could take into
account network performance & screen resolutions, in addition to
browser capabilities. Designing this to degrade gracefully without JS
might be a bit tricky, though.

Gabriel

[1]: https://phabricator.wikimedia.org/T124390

On Fri, May 13, 2016 at 3:29 PM, Bartosz Dziewoński <matma@gmail.com> wrote:
> On 2016-05-13 22:28, Jon Robson wrote:
>>
>> The ResourceLoaderImage module is being used widely to generate SVG
>> icons with png fallbacks. I'd be interested in seeing if we can use
>> this in some way for optimising SVGs and removing meta data.
>
>
> I don't know what you have in mind, but please remember that
> ResourceLoaderImage was not written with security in mind. It has a very
> simplified version of our usual SVG rendering code, and it assumes that any
> SVG files passed to it is trusted. We traded some caution for some
> performance. Giving it user-controlled data is going to result in security
> vulnerabilities (at the very least some denial of service ones).
>
> --
> Bartosz Dziewoński
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Docs, use of, and admin privileges for wikimedia github project?

2016-04-25 Thread Gabriel Wicke
On Mon, Apr 25, 2016 at 8:19 AM, Brion Vibber  wrote:
> More importantly,
> when folks have repos that they've been running on GitHub already and want
> to move into the wikimedia project (rather than switch to gerrit), what's
> the procedure? I'm an admin/owner so I can manually import people's repos
> but I'm not sure whether I'm supposed to... :)

The method we have been using is via 'transfer ownership' in the
original repo settings. I believe moving the repo to the wikimedia org
requires owner permissions, so for repos owned by non-owners this
might require two transfers: One to an owner, then from owner to the
org.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Deprecating rest.wikimedia.org in favor of /api/rest_v1/

2016-04-14 Thread Gabriel Wicke
Final reminder on this: We are planning to finally sunset
rest.wikimedia.org in the week starting April 25th, 1 1/2 weeks from
now. Please move your REST API clients to /api/rest_v1/ at the regular
project domains instead!

Thanks,

Gabriel

On Mon, Jan 25, 2016 at 11:00 AM, Gabriel Wicke <gwi...@wikimedia.org> wrote:
> We have decided to officially retire the rest.wikimedia.org domain in
> favor of /api/rest_v1/ at each individual project domain. For example,
>
>
>   https://rest.wikimedia.org/en.wikipedia.org/v1/?doc
>
> becomes
>
>   https://en.wikipedia.org/api/rest_v1/?doc
>
> Most clients already use the new path, and benefit from better
> performance from geo-distributed caching, no additional DNS lookups,
> and sharing of TLS / HTTP2 connections.
>
> We intend to shut down the rest.wikimedia.org entry point around
> March, so please adjust your clients to use /api/rest_v1/ soon.
>
> Thank you for your cooperation,
>
> Gabriel
>
> --
> Gabriel Wicke
> Principal Engineer, Wikimedia Foundation



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] REST API to follow page redirects from April 25th

2016-04-14 Thread Gabriel Wicke
We are planning to enable automatic redirect following in all REST API
[1] HTML entry points on April 25th. When responding to a request for
a redirected title [2], the response headers will contain:

Status: 302
Location: 

For most clients, this means that their HTTP client will automatically
follow redirects, simplifying common use cases. The few clients with a
need to retrieve the redirect page content itself have two options:

1) Disable following redirects in the client. For HTML and
data-parsoid entry points, the response still includes the HTML body &
regular response headers like the ETag.

2) Send a `?redirect=false` query string parameter. This option is
recommended for browsers, which lack control over redirect behavior
for historical security reasons.

If you do have a need to avoid following redirects, you can make these
changes before the feature is enabled. Internally, we have already
done so for VisualEditor and the Mobile Content Service. See also
https://phabricator.wikimedia.org/T118548 for background & related
discussion.

Let us know if you have any concerns or questions about this.

Thanks,

Gabriel Wicke for the Wikimedia Services Team

[1]: https://en.wikipedia.org/api/rest_v1/?doc (using en.wikipedia.org
as an example)
[2]: https://www.mediawiki.org/wiki/Help:Redirects

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] ArchCom RFC update #5

2016-04-13 Thread Gabriel Wicke
This week, JavaScript module interfaces in ResourceLoader
 were merged, the ServiceLocator
implementation continued, and there was a lively discussion of options for
balancing templates on IRC. Shadow namespaces
 are scheduled for discussion at
next week's IRC meeting.

Gabriel

RFC inbox

   -

   T30085: RFC: Allow user login with email address in addition to username
   : Last update October. Issue
   is email addresses associated with multiple accounts.  Possibly related to
   AuthManager work.
   -

   T128352: RfC: Need to merge Notifications and Watchlist or lack thereof
   : Very much a product
   question.
   -

   T130528: RFC: PSR-6 Cache interface in Mediawiki core
   : Addshore and Anomie have
   been working on this recently.

Approved RFCs

T108655 Standardise access to JavaScript interfaces
 (Roan): Previously approved.
Implementation landed in master this week.
This week's IRC meeting

T130567 RFC: Hygienic transclusions for WYSIWYG, incremental parsing &
composition ,

T11 DOM scopes , and
T114445 Balanced
templates : (Tim) The discussion
exposed two main questions: 1) How to best resolve content model conflicts,
and 2) whether to (eventually) default to balanced templates or not. The
implementation in T114445 
proposes a solution to mark specific templates for balancing, and explores
two options for conflict resolution. The discussion will continue on the
tasks.
Next week’s IRC meeting

T91162 Shadow namespaces 
(brion): A proposed mechanism for sharing content like templates or modules
cross-wiki, similar to how InstantCommons and foreign file repos work.
Kunal is getting ready to work on this.
Under discussion

T124792 RFC: Service Locator for MediaWiki core
 (Daniel): Discussed in IRC
Office hour last week (see task for notes). Discussed at Wikimedia
Hackathon 2016 in Israel .
Implementation under way.

T91162 RFC: Shadow namespaces 
(brion): Scheduled for IRC meeting next week.

T123753 Establish retrospective reports for Security and Performance
incidents  (RobLa): Briefly
discussed at last week's IRC meeting, some activity on the task.

T119908 RFC: Migrate code review / management to Phabricator from Gerrit
 (RobLa): ArchCom is looking for
more detail on CI integration, as well as background on alternatives
considered for code review + CI. On hold.

T39902 RFC: Implement rendering of redlinks as post-processor
 (Gabriel): Solutions for
highlighting links to non-existing pages in Parsoid HTML. Main question is
preprocessing vs. separate metadata processed on client. Experiment shows
that specific link matching / replacement can be done in <2ms for large
documents on the server.

T130663 RFC: Reference API requirements and options
 (Timo): We need to better
define the scope of this RFC and come up with a solid proposal. Open to
input on whether or not this should be blocked on larger product goals
relating to centralised citations. Join WikiCite 2016 in Germany!
https://meta.wikimedia.org/wiki/WikiCite_2016

T18691 RFC: Section headings should have a clickable anchor
 (Timo): Working on better
understanding of the problem space and possible solutions. Volker gathered
various considerations and challenges. Under discussion in Front-end
standards group.

T16950: Support global preferences
 (no shepherd): No clear owner
yet.

T130528 RFC: PSR-6 Cache interface in Mediawiki core
 (no shepherd).

T122942 RFC: Support language variants in the REST API
 (Gabriel): Discussing options
with Reading.
No activity in the last two weeks:

T124504 Transition WikiDev '16 working areas into working groups
 (RobLa)

T66214 Use content hash based image / thumb URLs & define an official thumb
API  (Brion)

T113034 RFC: Overhaul Interwiki map, unify with Sites and WikiMap
 (Daniel)

T128351 RFC: Notifications in core
 (Brion)

T122825 Service ownership and minimum maintenance requirements

[Wikitech-l] ArchCom RFC update #4

2016-04-06 Thread Gabriel Wicke
This week things have been relatively quiet, with many engineers attending
the Jerusalem Hackathon
<https://www.mediawiki.org/wiki/Wikimedia_Hackathon_2016>. Daniel and
others discussed dependency injection
<https://phabricator.wikimedia.org/T124792> at the hackathon, and a first
patch was merged.

With today's conclusion of the Final Comment Period, Max's proposal to require
mbstring support <https://phabricator.wikimedia.org/T129435> was officially
accepted, and the corresponding patch was merged.

At next week's IRC meeting, we will explore improving editing support and
performance with balanced and "hygienic" transclusions.

Gabriel

RFC inbox

T54807: Identify and remove legacy preferences from MediaWiki core
<https://phabricator.wikimedia.org/T54807>, T16950: Support global
preferences <https://phabricator.wikimedia.org/T16950>: These
preference-related RFCs currently don't have a clear owner, and need
product input.

Approved RFCs

T129435 RFC: Drop support for running without mbstring
<https://phabricator.wikimedia.org/T129435> (Max, Gabriel): Most
participants have expressed support. Based on the discussion, the ArchCom
approved the RFC today, and Max's patch
<https://gerrit.wikimedia.org/r/#/c/267309/> was already merged.
Under discussion

T130567 RFC: Hygienic transclusions for WYSIWYG, incremental parsing &
composition <https://phabricator.wikimedia.org/T130567>, T11 DOM scopes
<https://phabricator.wikimedia.org/T11> and T114445 Balanced templates
<https://phabricator.wikimedia.org/T114445>: (Tim) *Scheduled for IRC
discussion next week.*

T124792 Service Locator for MediaWiki core
<https://phabricator.wikimedia.org/T124792> (Daniel): Discussed at Hackathon
<https://etherpad.wikimedia.org/p/wmhack2016_DI>, first patch merged
<https://gerrit.wikimedia.org/r/#/c/264403/>. Implementation under way.

T123753 <https://phabricator.wikimedia.org/T123753> Establish retrospective
reports for Security <https://phabricator.wikimedia.org/tag/security/> and
Performance <https://phabricator.wikimedia.org/tag/performance/> incidents
(RobLa): Briefly discussed at last week's IRC meeting, some activity on the
task.

T119908 <https://phabricator.wikimedia.org/T119908> RFC: Migrate code
review / management to Phabricator from Gerrit (RobLa): ArchCom is looking
for more detail on CI integration, as well as background on alternatives
considered for code review + CI.

T108655 Standardise on how to access/register JavaScript interfaces
<https://phabricator.wikimedia.org/T108655> (Roan) Minimal version was
approved and is being implemented. Discussion has begun about a second RFC
for more contentious changes.

T39902 RFC: Implement rendering of redlinks (in a post-processor?)
<https://phabricator.wikimedia.org/T39902> (Gabriel): Solutions for
highlighting links to non-existing pages in Parsoid HTML. Main question is
preprocessing vs. separate metadata processed on client. Parsing and
Services teams investigating performance trade-offs.

T130663 RFC: Reference API requirements and options
<https://phabricator.wikimedia.org/T130663> (Timo): Working with Gabriel
and others to better define the scope of the RFC and come up with a solid
proposal. Relates to other on-going product goals and may be delayed on
better clarification on those and gathering of other use cases /
requirements.

T18691 RFC: Section headings should have a clickable anchor
<https://phabricator.wikimedia.org/T18691> (Timo): Working on better
understanding of the problem space and possible solutions. Volker gathered
various considerations and challenges on the RFC’s talk page at
mediawiki.org. Check them out!

No activity in the last two weeks:

T130528 RFC: PSR-6 Cache interface in Mediawiki core
<https://phabricator.wikimedia.org/T130528> (no shepherd)

T122942 RFC: Support language variants in the REST API
<https://phabricator.wikimedia.org/T122942> (Gabriel)

T124504 Transition WikiDev '16 working areas into working groups
<https://phabricator.wikimedia.org/T124504> (RobLa)

T66214 Use content hash based image / thumb URLs & define an official thumb
API <https://phabricator.wikimedia.org/T66214> (Brion)

T113034 RFC: Overhaul Interwiki map, unify with Sites and WikiMap
<https://phabricator.wikimedia.org/T113034> (Daniel)

T122825 Service ownership and minimum maintenance requirements
<https://phabricator.wikimedia.org/T122825> (Gabriel)

T128351 RFC: Notifications in core
<https://phabricator.wikimedia.org/T128351> (Brion)

T118517 RFC: Use  for media
<https://phabricator.wikimedia.org/T118517> (Brion)

T88596 Improving extension management
<https://phabricator.wikimedia.org/T88596> (Daniel)

T11 RFC: Introduce notion of DOM scopes in wikitext
<https://phabricator.wikimedia.org/T11> (Tim)

T120164 RFC: Institute "las

[Wikitech-l] ArchCom RFC update

2016-03-30 Thread Gabriel Wicke
intenance requirements
<https://phabricator.wikimedia.org/T122825> (Gabriel)

T128351 RFC: Notifications in core
<https://phabricator.wikimedia.org/T128351> (Brion)

T118517 RFC: Use  for media
<https://phabricator.wikimedia.org/T118517> (Brion)

T88596 Improving extension management
<https://phabricator.wikimedia.org/T88596> (Daniel)

T11 RFC: Introduce notion of DOM scopes in wikitext
<https://phabricator.wikimedia.org/T11> (Tim)

T130528 RFC: PSR-6 Cache interface in Mediawiki core
<https://phabricator.wikimedia.org/T130528> (No shepherd)


-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] ArchCom RFC update

2016-03-23 Thread Gabriel Wicke
Hi,

please have a look at this week's summary of new and ongoing RFC
discussions. There are several new RFCs, and some existing ones are moving
close to a decision. No RFCs were decided finally this week.

Because of the parallel Hackathon, no IRC discussion is scheduled for next
week.

Gabriel

New RFCs:

T130663 WIP RFC: Reference API requirements and options
 (Timo): API access and
component  markup for references; focus on gathering use cases /
requirements.

T122942 RFC: Support language variants in the REST API
 (Gabriel): Different options
for supporting languange variant selection in the REST API. Needed for
languages like Chinese.

T122825 Service Ownership and Maintenance
 (Gabriel): Ownership and
minimum maintenance requirements for production services. Strongly driven
by unclear ownership of OCG (PDF renderer).

T39902 RFC: Implement rendering of redlinks (in a post-processor?)
 (Gabriel): Solutions for
highlighting links to non-existing pages in Parsoid HTML. Main question is
preprocessing vs. separate metadata processed on client.

T130528 RFC: PSR-6 Cache interface in Mediawiki core
 (No shepherd): Exploring use of
standard PHP cache interface.

Today's IRC session:

T124792 Service Locator for MediaWiki core
 (Daniel): Introduce a service
locator (aka DI container) to allow code in mediaWiki core to make use of
the Dependency Injection (DI) and Service Locator (SL) patterns.

The discussion showed general support. Several participants expressed a
desire to write more code with it before making a final call. Concrete
suggestions on areas would be welcome. Tentative working group forming,
aiming to discuss at Jerusalem Hackathon.

Under discussion:

T129435 RFC: drop support for running without mbstring
 (Gabriel): Very focused RFC by
Max. Main question in discussion so far is whether polyfilling is worth it.
Max reaching out to mediawiki-l.

T108655 Standardise on how to access/register JavaScript interfaces
 (Roan): No update since last
week, I need to split this task but I haven’t had time to yet. Last week’s
update:

Considering to split out contentious part (file-based require, or something
like it; to support embedding libraries), move forward on less
controversial part (basic module-name-based require infrastructure)

T18691 RFC: Section headings should have a clickable anchor
 (Timo): Under discussion with
Volker and  Frontend Standards Group. Volker and team to collect different
benefits and concerns to determine whether this is generally a desirable
feature. And to explore other conceptually different solutions to the
underlying use case of “sharing a link to a section” (e.g. a better table
of contents, or live address bar).

T124504 Transition WikiDev '16 working areas into working groups
 (RobLa): No concrete progress;
MZMcBride advocates for organic growth.

T128351 RfC: Notifications in core
 (Brion): No movement last week.
Needs clarification of interfaces & scope as follow-up from IRC meeting.

T66214 Use content hash based image / thumb URLs & define an official thumb
API  (Brion): Clarified
requirements & priorities in last week's IRC discussion. Needs update to
reflect discussion.

T118517 [RFC] Use  for media
 (Brion): Revisit soon.

T88596 Improving extension management
 (Daniel): Discussion is picking
up again, patch for review.

T113034 RFC: Overhaul Interwiki map, unify with Sites and WikiMap
 (Daniel): Has been discussed
before, needs somebody to actually take this on.

T11 [RFC] Introduce notion of DOM scopes in wikitext
 (Tim): Active related
discussion and prototyping at Balanced templates
 and Hygienic transclusions for
WYSIWYG, incremental parsing & composition: Options and trade-offs
.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Services] [ANNOUNCEMENT] RESTBase and related services DC switch-over test

2016-03-19 Thread Gabriel Wicke
The Services DC fail-over test finished without user impact, and
traffic is now switched back to eqiad.

We found an issue with one of the Cassandra nodes running out of
memory after switching update processing to codfw, as well as a
hick-up with one instance in eqiad. Due to the redundant set-up, this
did not affect operations. We are investigating these issues, and will
address them before the general fail-over test in April.

Many thanks to Marko Obrovac, Eric Evans (Services), Filippo
Giunchedi, Giuseppe Lavagetto and Emanuele Rocca (Operations), who
prepared the infrastructure to make the switch-over this smooth.

Gabriel

On Thu, Mar 17, 2016 at 3:10 AM, Marko Obrovac <mobro...@wikimedia.org> wrote:
> FYI, the test has started. We are in the process of switching the traffic to
> the Dallas DC.
>
> Cheers,
> Marko
>
> On 14 March 2016 at 22:54, Marko Obrovac <mobro...@wikimedia.org> wrote:
>>
>> Hello,
>>
>> The WMF’s technology department has for this quarter the goal of testing
>> and temporarily switching the main operational data centre from Eqiad
>> (located in Chicago) to Codfw (located in Dallas)~[1,2]. This includes both
>> back-end-processing as well as serving live traffic from it.
>>
>> As a part of this effort, we are scheduling a switch-over for RESTBase and
>> its back-end services, including: Parsoid, the Mobile Content Service,
>> CXServer, Mathoid, Citoid, Apertium and Zotero~[3]. Technically, it will not
>> be a real switch-over per se, because we will keep all of those services
>> active in both DCs. However, external traffic will be directed to the Dallas
>> DC only.
>>
>> === When is it and what does it mean for me? ===
>> The switch-over test is planned for this Thursday, 2016-03-17. We have
>> allotted a three-hour window for this~[4].  There is nothing users should do
>> before or after the switch; it will be transparent for them. There are two
>> things users should note, though:
>>
>> 1) At the time of the switch-over, users might receive error responses for
>> a while (both 4xx and 5xx status codes). While we will test most of the
>> things ahead of time, we cannot test the actual traffic shifting, so small
>> bumps might be noticed.
>> 2) After the switch to the Dallas DC, users will likely see their response
>> latencies slightly elevated. During the test, some requests might experience
>> a slightly larger latency. This will occur because all of the services that
>> will be responding to live requests still need to contact the main MediaWiki
>> cluster, which will remain in Eqiad (the other DC) until a complete
>> switch-over of the infrastructure is performed. However, given the multiple
>> levels of caching, the 40 ms of penalty to go cross-DC for an uncached API
>> request does not seem too taxing.
>>
>> === Wait, what about my service X running in WMF production? ===
>> If you are a service owner of one the aforementioned services, there are
>> no explicit actions you should take prior to, during or after the
>> switch-over test. This test could, however, affect your service depending on
>> whether it usually serves live traffic or is mostly operational during
>> various internal updates. MediaWiki and JobQueue processing will still be
>> performed in Eqiad, so in the latter case your service should not see a
>> change in the usage pattern. If, however, your service is mostly in charge
>> of responding to live requests coming through RESTBase, those will be
>> handled by instances in Codfw. However, as these services are full replicas
>> of their Eqiad counterparts and are stateless, no major breakage will
>> happen.
>>
>> Should you have any questions or concerns, don’t hesitate to contact us
>> here or on IRC (#wikimedia-services @ freenode).
>>
>> Best,
>> Marko Obrovac, PhD
>> Senior Services Engineer
>> Wikimedia Foundation
>>
>> [1]
>> https://www.mediawiki.org/wiki/Wikimedia_Engineering/2015-16_Q3_Goals#Technology
>> [2] https://phabricator.wikimedia.org/project/profile/1723/
>> [3] https://phabricator.wikimedia.org/T127974
>> [4]
>> https://wikitech.wikimedia.org/wiki/Deployments#Thursday.2C.C2.A0March.C2.A017
>>
>
>
>
> --
> Marko Obrovac, PhD
> Senior Services Engineer
> Wikimedia Foundation
>
> ___
> Services mailing list
> servi...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/services
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] ArchCom RFC update

2016-03-19 Thread Gabriel Wicke
Hi,


I am writing to give you a quick summary of which RFCs we are currently
working on in the ArchCom. Each RFC has the name of its 'shepherd' listed
next to it. A shepherd is responsible for working with the RFC author to
move the RFC forward, publicize it, and represent it in the architecture
committee.


Please have a look, and join the conversation on the RFCs that are of
interest to you.


Gabriel


New RFCs:

T129435 RFC: drop support for running without mbstring
 (Gabriel): New, very focused
RFC by Max, discussion started.

Upcoming IRC sessions:

T124792 Service Locator for MediaWiki core
 (Daniel): Introduce a service
locator (aka DI container) to allow code in mediaWiki core to make use of
the Dependency Injection (DI) and Service Locator (SL) patterns.

Under discussion:

T108655 Standardise on how to access/register JavaScript interfaces
 (Roan): Considering to split
out contentious part (file-based require, or something like it; to support
embedding libraries), move forward on less controversial part (basic
module-name-based require infrastructure)

T18691 RFC: Section headings should have a clickable anchor
 (Timo): Reworking proposal with
designers & Volker

T124504 Transition WikiDev '16 working areas into working groups
 (RobLa) Finding folks to fully
assume ownership on following up from each session has been difficult

T128351 RfC: Notifications in core
 (Brion): Discussed last week,
now clarifying interfaces & scope.

T66214 Use content hash based image / thumb URLs & define an official thumb
API  (Brion): Discussion last &
this week.

T118517 [RFC] Use  for media
 (Brion): Revisit soon.

T124752 [RFC] Expiring watch list entries
 (Daniel): Iterating on the
design, discussion on the task.

T113034 RFC: Overhaul Interwiki map, unify with Sites and WikiMap
 (Daniel): Has been discussed
before, needs somebody to actually take this on.

T11 [RFC] Introduce notion of DOM scopes in wikitext
 (Tim): Under discussion in the
parsing team, early stage.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] New RFC: remove mbstring fallbacks

2016-03-19 Thread Gabriel Wicke
So far, no serious concerns have been expressed on the task [1].

Users of Debian-based distributions or Windows should have mbstring
enabled by default, so should not be affected. However, users of
RPM-based distributions would potentially need to install an extra
package.

So, I would like to hear from users on RPM-based environments. Do you
already have mbstring installed, and if not, would it be an issue to
install it?

Thanks,

Gabriel

[1]: https://phabricator.wikimedia.org/T129435

On Thu, Mar 10, 2016 at 2:27 AM, Legoktm <legoktm.wikipe...@gmail.com> wrote:
> Hi,
>
> On 03/09/2016 06:47 PM, Max Semenik wrote:
>> Hey, I have a new topic I'd like to discuss. It's about mbstring and
>> whether do we really need to support running without it.
>>
>> The RFC is at https://gerrit.wikimedia.org/r/#/c/267309/
>
> I think you meant <https://phabricator.wikimedia.org/T129435> :)
>
> -- Legoktm
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] REST API soon redirecting to canonical URLs

2016-03-15 Thread Gabriel Wicke
Hi,

this is to let you know that we will start redirecting GET requests
with non-canonical titles to their canonical equivalent next week.
This is done to improve caching, and to ensure reliable purging of
cached responses.

The vast majority of clients handle redirects transparently, so at
most this will lead to a small slow-down from the redirect. To avoid
being redirected, make sure to use underscores instead of spaces, as
in "Main_Page".

Thanks,

Gabriel Wicke

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Mostly about anglophile devs, and a small complaint about VisualEditor

2016-03-14 Thread Gabriel Wicke
On Sun, Mar 13, 2016 at 3:39 PM, Derk-Jan Hartman
 wrote:
>
>> On 13 mrt. 2016, at 20:36, Amir E. Aharoni  
>> wrote:
>>
>> Ideally, this should some day be real metadata and not templates. Using
>> templates for this is a hack that keeps living long after it should have
>> died.
>>
>> And it should be easy and fast to edit this metadata, no matter if the
>> editor prefers VE or wiki syntax.
>>
>> This is a far-fetched ideal, but that's a how it should be.
>
>
> Also don't forget that VE is still new. you could write twinkle like tools 
> using VE to edit metadata templates quickly, but no one has done that so far.

There is https://en.wikipedia.org/wiki/User:Jackmcbarn/editProtectedHelper,
which "adds the ability to respond to edit requests quickly". It is
using Parsoid, but not VisualEditor.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Minor REST API cleanup: Remove experimental listings, make timeuuid parameter mandatory for data-parsoid

2016-03-07 Thread Gabriel Wicke
tl;dr: You are *very* likely not affected.

We are planning two changes in the REST API:

1) Remove the experimental /page/html/ and /page/data-parsoid/
listings [1][2]. Our metrics show that these are essentially unused.
The same title listing remains available at /page/title/ [3].

2) Make the `tid` path parameter in the unstable
/page/data-parsoid/{title}/{revision}/{tid} [4] end point mandatory.
Data-parsoid is tied to a specific HTML render, and only requests with
an explicit timeuuid from the corresponding HTML response are
guaranteed to get the correct data-parsoid version.

If things go to plan, we will deploy these changes sometime next week.

Thank you for your understanding,

Gabriel Wicke for the Services team

[1]: https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_html
[2]: 
https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_data_parsoid
[3]: https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_title
[4]: 
https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_data_parsoid_title_revision_tid

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RFC: Defining a policy for REST API result format versioning / negotiation

2016-02-24 Thread Gabriel Wicke
At the conclusion of the Final Comment Period, the Architecture
Committee today decided to accept the policy as summarized in my last
mail:

- Use `Accept` headers for format negotiation.
- Strongly encourage users to explicitly request a specific format
version, and update the default format promptly.

We will now document and implement this policy. The first use of
content negotiation will be an upcoming change in Parsoid's HTML
format.

Thank you for your input,

Gabriel


On Wed, Feb 17, 2016 at 3:36 PM, Gabriel Wicke <gwi...@wikimedia.org> wrote:
> The IRC discussion just finished, thanks to everybody who
> participated! You can read a full log on the task [1]. Here is a short
> summary:
>
> == Question 1: How to request a specific response format ==
>
> Overall there was a slight preference for using the Accept header over
> query strings for format negotiation. It was noted that support for
> query strings can be added additionally at a later point.
>
> == Question 2: What to do if no format was specified ==
>
> The main question in the discussion was whether strong encouragement
> will be enough to persuade clients to explicitly specify a format
> version. A common concern was that clients without explicit version in
> the request won't pay attention to announcements either, and will only
> find out when things break.
>
> There was consensus for starting with strong encouragement and quick
> default changes. If most clients continue to omit explicit versions in
> their requests, then we can reconsider *forcing* clients to supply a
> version.
>
> == Next steps ==
>
> The Architecture Committee will officially decide this matter based on
> the discussion at next Wednesday's meeting.
>
> Gabriel
>
> [1]: https://phabricator.wikimedia.org/T124365#2036959
>
> On Mon, Feb 15, 2016 at 7:41 PM, Gabriel Wicke <gwi...@wikimedia.org> wrote:
>> We will discuss options for REST API response format versioning and
>> -negotiation in Wednesday's RFC meeting:
>>
>> Topic: https://phabricator.wikimedia.org/T124365
>> Time: Wednesday 22:00 UTC (2pm PST)
>> Location: #wikimedia-office IRC channel
>>
>> This RFC will then enter its one-week Final Comment Period, after
>> which the Architecture Committee will decide based on the discussion.
>>
>> I'm looking forward to your input on the task or IRC.
>>
>> Gabriel
>>
>> On Thu, Jan 21, 2016 at 4:29 PM, Gabriel Wicke <gwi...@wikimedia.org> wrote:
>>> Hi,
>>>
>>> we are considering a policy for REST API end point result format
>>> versioning and negotiation. The background and considerations are
>>> spelled out in a task and mw.org page:
>>>
>>> https://phabricator.wikimedia.org/T124365
>>> https://www.mediawiki.org/wiki/Talk:API_versioning
>>>
>>> Based on the discussion so far, have come up with the following
>>> candidate solution:
>>>
>>> 1) Clearly advise clients to explicitly request the expected mime type
>>> with an Accept header. Support older mime types (with on-the-fly
>>> transformations) until usage has fallen below a very low percentage,
>>> with an explicit sunset announcement.
>>>
>>> 2) Always return the latest content type if no explicit Accept header
>>> was specified.
>>>
>>> We are interested in hearing your thoughts on this.
>>>
>>> Once we have reached rough consensus on the way forward, we intend to
>>> apply the newly minted policy to an evolution of the Parsoid HTML
>>> format, which will move the data-mw attribute to a separate metadata
>>> blob.
>>>
>>> Gabriel Wicke
>>
>>
>>
>> --
>> Gabriel Wicke
>> Principal Engineer, Wikimedia Foundation
>
>
>
> --
> Gabriel Wicke
> Principal Engineer, Wikimedia Foundation



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RFC: Defining a policy for REST API result format versioning / negotiation

2016-02-17 Thread Gabriel Wicke
The IRC discussion just finished, thanks to everybody who
participated! You can read a full log on the task [1]. Here is a short
summary:

== Question 1: How to request a specific response format ==

Overall there was a slight preference for using the Accept header over
query strings for format negotiation. It was noted that support for
query strings can be added additionally at a later point.

== Question 2: What to do if no format was specified ==

The main question in the discussion was whether strong encouragement
will be enough to persuade clients to explicitly specify a format
version. A common concern was that clients without explicit version in
the request won't pay attention to announcements either, and will only
find out when things break.

There was consensus for starting with strong encouragement and quick
default changes. If most clients continue to omit explicit versions in
their requests, then we can reconsider *forcing* clients to supply a
version.

== Next steps ==

The Architecture Committee will officially decide this matter based on
the discussion at next Wednesday's meeting.

Gabriel

[1]: https://phabricator.wikimedia.org/T124365#2036959

On Mon, Feb 15, 2016 at 7:41 PM, Gabriel Wicke <gwi...@wikimedia.org> wrote:
> We will discuss options for REST API response format versioning and
> -negotiation in Wednesday's RFC meeting:
>
> Topic: https://phabricator.wikimedia.org/T124365
> Time: Wednesday 22:00 UTC (2pm PST)
> Location: #wikimedia-office IRC channel
>
> This RFC will then enter its one-week Final Comment Period, after
> which the Architecture Committee will decide based on the discussion.
>
> I'm looking forward to your input on the task or IRC.
>
> Gabriel
>
> On Thu, Jan 21, 2016 at 4:29 PM, Gabriel Wicke <gwi...@wikimedia.org> wrote:
>> Hi,
>>
>> we are considering a policy for REST API end point result format
>> versioning and negotiation. The background and considerations are
>> spelled out in a task and mw.org page:
>>
>> https://phabricator.wikimedia.org/T124365
>> https://www.mediawiki.org/wiki/Talk:API_versioning
>>
>> Based on the discussion so far, have come up with the following
>> candidate solution:
>>
>> 1) Clearly advise clients to explicitly request the expected mime type
>> with an Accept header. Support older mime types (with on-the-fly
>> transformations) until usage has fallen below a very low percentage,
>> with an explicit sunset announcement.
>>
>> 2) Always return the latest content type if no explicit Accept header
>> was specified.
>>
>> We are interested in hearing your thoughts on this.
>>
>> Once we have reached rough consensus on the way forward, we intend to
>> apply the newly minted policy to an evolution of the Parsoid HTML
>> format, which will move the data-mw attribute to a separate metadata
>> blob.
>>
>> Gabriel Wicke
>
>
>
> --
> Gabriel Wicke
> Principal Engineer, Wikimedia Foundation



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RFC: Defining a policy for REST API result format versioning / negotiation

2016-02-15 Thread Gabriel Wicke
We will discuss options for REST API response format versioning and
-negotiation in Wednesday's RFC meeting:

Topic: https://phabricator.wikimedia.org/T124365
Time: Wednesday 22:00 UTC (2pm PST)
Location: #wikimedia-office IRC channel

This RFC will then enter its one-week Final Comment Period, after
which the Architecture Committee will decide based on the discussion.

I'm looking forward to your input on the task or IRC.

Gabriel

On Thu, Jan 21, 2016 at 4:29 PM, Gabriel Wicke <gwi...@wikimedia.org> wrote:
> Hi,
>
> we are considering a policy for REST API end point result format
> versioning and negotiation. The background and considerations are
> spelled out in a task and mw.org page:
>
> https://phabricator.wikimedia.org/T124365
> https://www.mediawiki.org/wiki/Talk:API_versioning
>
> Based on the discussion so far, have come up with the following
> candidate solution:
>
> 1) Clearly advise clients to explicitly request the expected mime type
> with an Accept header. Support older mime types (with on-the-fly
> transformations) until usage has fallen below a very low percentage,
> with an explicit sunset announcement.
>
> 2) Always return the latest content type if no explicit Accept header
> was specified.
>
> We are interested in hearing your thoughts on this.
>
> Once we have reached rough consensus on the way forward, we intend to
> apply the newly minted policy to an evolution of the Parsoid HTML
> format, which will move the data-mw attribute to a separate metadata
> blob.
>
> Gabriel Wicke



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Mass migration to new syntax - PRO or CON?

2016-02-12 Thread Gabriel Wicke
Overall I'm PRO, as consistency is worth a lot, and tools can apply
such changes consistently and efficiently.

We have applied broad formatting changes to large JS codebases using
jscs, which has worked well when those changes were well prepared.
Typically, this involved gradually refining the tool settings until a
reasonable diff was achieved.

On Fri, Feb 12, 2016 at 12:44 PM, Chad <innocentkil...@gmail.com> wrote:
> On Fri, Feb 12, 2016 at 9:14 AM Chad <innocentkil...@gmail.com> wrote:
>
>> On Fri, Feb 12, 2016 at 7:27 AM Daniel Kinzler <dan...@brightbyte.de>
>> wrote:
>>
>>> CON: don't do mass migration to new syntax, only start using new styles
>>> and
>>> features when touching the respective bit of code anyway. The argument is
>>> here
>>> that touching many lines of code, even if it's just for whitespace
>>> changes,
>>> causes merge conflicts when doing backports and when rebasing patches.
>>> E.g. if
>>> we touch half the files in the codebase to change to the new array
>>> syntax, who
>>> is going to manually rebase the couple of hundred patches we have open?
>>>
>>>
>>> As can be seen on the proposed patch I linked, several of the long term
>>> developers oppose mass changes like this. A quick round of feedback in the
>>> architecture committee draws a similar picture. However, perhaps there are
>>> compelling arguments for doing the mass migration that we haven't heard
>>> yet. So
>>> please give a quick PRO or CON, optionally with some rationale.
>>>
>>> My personal vote is CON. No rebase hell please! Changing to the syntax
>>> doesn't
>>> buy us anything.
>>>
>>>
>> CON, for all the reasons you mentioned. Also: style only changes are pain
>> when you're
>> trying to annotate/blame a particular line of code.
>>
>> ESPECIALLY for something so silly as array formatting which gains us
>> *absolutely nothing*
>>
>> -Chad
>>
>
> I change my vote to PRO.
>
> Mainly because people are gonna do it anyway...
>
> Last thoughts on the thread, I got bigger fish to fry than array syntax
> sugar :D
>
> -Chad
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikitext-l] [Mediawiki-api] [Engineering] Deprecating rest.wikimedia.org in favor of /api/rest_v1/

2016-02-01 Thread Gabriel Wicke
Hi Luigi,

On Fri, Jan 29, 2016 at 12:31 PM, Luigi Assom  wrote:
> -  how to extract _ID from ETag in headers:
> GET /page/title/{title}

the page id is indeed not directly exposed in the HTML response.
However, the revision number is exposed as part of the ETag. This can
then be used to request revision metadata including the page id at
https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_revision_revision.
This is admittedly not very convenient, so I created
https://phabricator.wikimedia.org/T125453 for generally improved page
id support in the REST API.

> - how to ensure
> GET /page/title/{title with different char encoding or old titles are always
> resolved to last canonical version}

The storage backing this end point is automatically kept up to date
with edits and dependency changes. Edits in particular should be
reflected within a few seconds.

>> If you refer to
>>
>> https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_graph_png_title_revision_graph_id,
>> this is an end point exposing rendered graph images for
>> https://www.mediawiki.org/wiki/Extension:Graph (as linked in the end
>> point documentation).
>
>
> Oh very interesting!
> So basically html markup can be extended ?
> Would it be possible to share json objects as html5 markup and embed them in
> wiki pages?

The graph extension is using the regular MediaWiki tag extension
mechanism: https://www.mediawiki.org/wiki/Manual:Tag_extensions

Graphs are indeed defined using JSON within this tag.

> I want to avoid to update my graph just because titles changes: entities are
> always the same.

Makes sense. The current API is optimized for the common case of
access by title, but we will consider adding access by page ID as
well.

> I still don't know what parsoid is.

Parsoid is the service providing semantic HTML and a bi-directional
conversion between that & wikitext:
https://www.mediawiki.org/wiki/Parsoid

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parsoid entrypoint http://parsoid-lb.eqiad.wikimedia.org being decommissioned

2016-02-01 Thread Gabriel Wicke
Multi-line input for transform end points is now live:
https://en.wikipedia.org/api/rest_v1/?doc#!/Transforms/post_transform_wikitext_to_html_title_revision

On Sat, Jan 30, 2016 at 10:46 AM, Gabriel Wicke <gwi...@wikimedia.org> wrote:
> Basic multi-line input support for wikitext / html transforms turned
> out to be quite straightforward to implement:
> https://phabricator.wikimedia.org/T110712#1984226
>
> Production should have multi-line inputs some time next week.
>
> On Sat, Jan 30, 2016 at 9:11 AM, Subramanya Sastry
> <ssas...@wikimedia.org> wrote:
>> On 01/30/2016 09:50 AM, Bartosz Dziewoński wrote:
>>>
>>> So what is the replacement for
>>> http://parsoid-lb.eqiad.wikimedia.org/_wikitext/ if I just want to see how
>>> Parsoid renders a piece of wikitext? It seems the fancy forms at
>>> https://en.wikipedia.org/api/rest_v1/?doc don't actually allow me to do the
>>> same simple thing.
>>>
>>> I figured out I must use "/transform/wikitext/to/html{/title}{/revision}"
>>> (that's a mouthful), but the 'wikitext' field there allows only a single
>>> line. It's also fairly inconvenient that the page displays escaped HTML
>>> code.
>>
>>
>> https://phabricator.wikimedia.org/T110712 is where you can ask support for
>> it. :)
>>
>> Alternatively, as a developer, if you installed Parsoid, you will find the
>> CLI parse.js tool a lot more convenient to use. YMMV.
>>
>> Subbu.
>>
>>
>> _______
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> Gabriel Wicke
> Principal Engineer, Wikimedia Foundation



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Announcing mediawiki-containers, a Docker-based MediaWiki installer

2016-01-30 Thread Gabriel Wicke
I see containers as one ingredient in a more automated and tested
pipeline from development through CI to production.
mediawiki-containers could expand to cover the development use case,
but I think we can and should move from there into CI, and finally
production.

Right now, Yuvi is evaluating the Kubernetes cluster manager in labs.
Its features include scheduling of "pods" (groups of containers) to
hardware nodes, networking, rolling deploys and more. While all these
features provide a very high degree of automation, they also mean that
failures in Kubernetes can have grave consequences. I think operations
are wise to wait for Kubernetes to mature a bit further before
considering it for critical production use cases.

Rather than waiting until one-stop cluster managers are mature, we
could also start with a more traditional config / deploy system. I
have played with this approach using Ansible [1] a while ago, and the
ergonomics are pretty much the same as git-based deploys. There is
also some support to run docker images in systemd, which could be an
alternative if we want to avoid the dependency on the docker runtime
in production. This older task lists some options:
https://phabricator.wikimedia.org/T93439

Lets get together and figure out a plan.

Gabriel

[1]: http://docs.ansible.com/ansible/docker_module.html

On Fri, Jan 29, 2016 at 8:23 PM, James Forrester
<jforres...@wikimedia.org> wrote:
> On 27 December 2015 at 11:52, Ori Livneh <o...@wikimedia.org> wrote:
>>
>> On Thu, Dec 24, 2015 at 3:57 PM, Gabriel Wicke <gwi...@wikimedia.org>
>> wrote:
>>>
>>> I am writing to announce mediawiki-containers [1], a simple installer for
>>> MediaWiki with VisualEditor, Parsoid, RESTBase and other services, using
>>> Linux containers.
>>
>>
>> This is very nice work -- kudos. Is it too soon to envision running this
>> (or rather, some future iteration) in production at Wikimedia? What would
>> need to happen?
>
>
> Ping on this. I for one would be interested too. :-)
>
> J.
> --
> James D. Forrester
> Lead Product Manager, Editing
> Wikimedia Foundation, Inc.
>
> jforres...@wikimedia.org | @jdforrester



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parsoid entrypoint http://parsoid-lb.eqiad.wikimedia.org being decommissioned

2016-01-30 Thread Gabriel Wicke
Basic multi-line input support for wikitext / html transforms turned
out to be quite straightforward to implement:
https://phabricator.wikimedia.org/T110712#1984226

Production should have multi-line inputs some time next week.

On Sat, Jan 30, 2016 at 9:11 AM, Subramanya Sastry
<ssas...@wikimedia.org> wrote:
> On 01/30/2016 09:50 AM, Bartosz Dziewoński wrote:
>>
>> So what is the replacement for
>> http://parsoid-lb.eqiad.wikimedia.org/_wikitext/ if I just want to see how
>> Parsoid renders a piece of wikitext? It seems the fancy forms at
>> https://en.wikipedia.org/api/rest_v1/?doc don't actually allow me to do the
>> same simple thing.
>>
>> I figured out I must use "/transform/wikitext/to/html{/title}{/revision}"
>> (that's a mouthful), but the 'wikitext' field there allows only a single
>> line. It's also fairly inconvenient that the page displays escaped HTML
>> code.
>
>
> https://phabricator.wikimedia.org/T110712 is where you can ask support for
> it. :)
>
> Alternatively, as a developer, if you installed Parsoid, you will find the
> CLI parse.js tool a lot more convenient to use. YMMV.
>
> Subbu.
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Mediawiki-api] [Engineering] Deprecating rest.wikimedia.org in favor of /api/rest_v1/

2016-01-29 Thread Gabriel Wicke
Luigi,

On Thu, Jan 28, 2016 at 2:09 AM, XDiscovery Team <i...@xdiscovery.com> wrote:
> I tried /rest_v1/ endpoint and it is terribly fast.

that is great to hear. A major goal is indeed to provide high volume
and low latency access to our content.

> @Strainu / @Gabriel , what does  'graph' extension do ?

If you refer to
https://en.wikipedia.org/api/rest_v1/?doc#!/Page_content/get_page_graph_png_title_revision_graph_id,
this is an end point exposing rendered graph images for
https://www.mediawiki.org/wiki/Extension:Graph (as linked in the end
point documentation).

> I have few questions for using proxy cache:
> 1# Is it possible to query a page by page_ID and include redirect?

We don't currently provide access by page ID. Could you describe your
use case a bit to help us understand how access by page id would help
you?

> /page/title/{title}
> allow to get metadata by page, including the pageID , but I would like to
> have final page redirect (e.g. dna return 7956 and I would like to fetch
> 7955 of redirected 'DNA' )

We are looking into improving our support for redirects:
https://phabricator.wikimedia.org/T118548. Your input on this topic
would be much appreciated.

> /page/html/{title} get the article but page_ID / curid is missing in source
> I would like to get the two combined.

This information is actually included in the response, both in the
`ETag` header and in the  of the HTML itself. I have updated the
documentation to spell this out more clearly in [1]. The relevant
addition is this:

The response provides an `ETag` header indicating the revision and
render timeuuid separated by a slash (ex: `ETag:
701384379/154d7bca-c264-11e5-8c2f-1b51b33b59fc`). This ETag can be
passed to the HTML save end point (as `base_etag` POST parameter), and
can also be used to retrieve the exact corresponding data-parsoid
metadata, by requesting the specific `revision` and `tid` indicated by
the `ETag`.

> 2# The rest are experimental:
> what could happen if a query fail?
> Does it raise an error, return 404 page or what else?

The stability markers are primarily about request and response
formats, and not about technical availability. Experimental end points
can change at any time, which can result in errors (if the request
interface changed), or return a different response format.

We are currently discussing the use of `Accept` headers for response
format versioning at
https://www.mediawiki.org/wiki/Talk:API_versioning. This will allow us
to more aggressively stabilize end points by giving us the option of
tweaking response formats without breaking existing clients.

> I am thinking if possible to use api.wikipedia as fallback, and use proxy
> cache as primary source any ajax example for doing that to handle possible
> failures?

Yes, this is certainly possible. However, you can rely on end points
currently marked as "unstable" in the REST API. Basically all of them
are used by a lot of production clients at this point, and are very
reliable. Once we introduce general `Accept` support, basically all of
the unstable end points will likely become officially "stable", and
several `experimental` end points will graduate to `unstable`.

> 3# Does /rest/ endpoint exist also for other languages?

Yes, it is available for all 800+ public Wikimedia projects at /api/rest_v1/.


[1]: 
https://github.com/wikimedia/restbase/pull/488/files#diff-2b6b60416eaafdf0ab45f6c9ffb8be3aR225

-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] [FINAL COMMENT PERIOD] [RFC]: Defining a policy for REST API result format versioning / negotiation

2016-01-29 Thread Gabriel Wicke
This is now entering its final comment period, so please weigh in at
https://phabricator.wikimedia.org/T124365.

Based on your input, the Parsing, Editing & Services teams will make a
decision on this next Wednesday, Feb 2nd.

Thanks,

Gabriel

On Thu, Jan 21, 2016 at 4:29 PM, Gabriel Wicke <gwi...@wikimedia.org> wrote:
> Hi,
>
> we are considering a policy for REST API end point result format
> versioning and negotiation. The background and considerations are
> spelled out in a task and mw.org page:
>
> https://phabricator.wikimedia.org/T124365
> https://www.mediawiki.org/wiki/Talk:API_versioning
>
> Based on the discussion so far, have come up with the following
> candidate solution:
>
> 1) Clearly advise clients to explicitly request the expected mime type
> with an Accept header. Support older mime types (with on-the-fly
> transformations) until usage has fallen below a very low percentage,
> with an explicit sunset announcement.
>
> 2) Always return the latest content type if no explicit Accept header
> was specified.
>
> We are interested in hearing your thoughts on this.
>
> Once we have reached rough consensus on the way forward, we intend to
> apply the newly minted policy to an evolution of the Parsoid HTML
> format, which will move the data-mw attribute to a separate metadata
> blob.
>
> Gabriel Wicke



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Changes in the RFC decision making process

2016-01-28 Thread Gabriel Wicke
In the last weeks we have been exploring ways to improve our technical
consensus building and decision making process. I wrote a short RFC
[1] describing some issues, and proposed to adopt ideas from the Rust
community [2] to address them. The discussion on the task and in an
IRC meeting showed broad support for the proposals.

In yesterday's architecture committee meeting, we decided to adopt
much of the Rust RFC decision making process [3] on a trial basis.
Concretely, this means:

- We will nominate a member of the architecture committee as a
shepherd, guiding an active RFC through the process. Among other
things, the shepherd is responsible for informing all relevant
stakeholders of the ongoing discussion on the task. The shepherd might
also lead an IRC discussion on the RFC, which will be summarized on
the task.

- Once the discussion on a task plateaus or stalls, the shepherd (in
coordination with the RFC author(s)) announces and widely publicizes a
"Final Comment Period", which is one week.

- At the end of the "Final Comment Period", the architecture committee
decides based on the points made in the RFC discussion, and justifies
its decision based on the overall project principles and priorities.
If any new facts or aspects are surfaced in this discussion, a new
Final Comment Period needs to be started before making a decision.

For now, we are holding off on the second part of the RFC, the
introduction of working groups. There is agreement that we need to
broaden the involvement and scale the process, but the details of how
are still under discussion.

Gabriel

[1]: https://www.mediawiki.org/wiki/Requests_for_comment/Governance
[2]: https://github.com/rust-lang/rfcs/blob/master/text/1068-rust-governance.md
[3]: 
https://github.com/rust-lang/rfcs/blob/master/text/1068-rust-governance.md#decision-making

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Deprecating rest.wikimedia.org in favor of /api/rest_v1/

2016-01-25 Thread Gabriel Wicke
We have decided to officially retire the rest.wikimedia.org domain in
favor of /api/rest_v1/ at each individual project domain. For example,


  https://rest.wikimedia.org/en.wikipedia.org/v1/?doc

becomes

  https://en.wikipedia.org/api/rest_v1/?doc

Most clients already use the new path, and benefit from better
performance from geo-distributed caching, no additional DNS lookups,
and sharing of TLS / HTTP2 connections.

We intend to shut down the rest.wikimedia.org entry point around
March, so please adjust your clients to use /api/rest_v1/ soon.

Thank you for your cooperation,

Gabriel

-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Deprecating rest.wikimedia.org in favor of /api/rest_v1/

2016-01-25 Thread Gabriel Wicke
Strainu,

On Mon, Jan 25, 2016 at 11:01 AM, Strainu  wrote:
> Hi,
>
> Does this apply to the Graph extension as well?

the graph extension has been using /api/rest_v1/ right from the start,
so it's likely that no changes are needed for graphs.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Deprecating rest.wikimedia.org in favor of /api/rest_v1/

2016-01-25 Thread Gabriel Wicke
On Mon, Jan 25, 2016 at 11:38 AM, Oliver Keyes  wrote:
> Will it apply to the pageviews API as well?

It will, but the canonical URL for this has always been
https://wikimedia.org/api/rest_v1/?doc, which will continue to work.
Are you aware of any pageview users hitting rest.wikimedia.org?

In any case, we'll check the logs for remaining rest.wikimedia.org
accesses & make an effort to remind remaining users before
decommissioning it.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] RFC: Defining a policy for REST API result format versioning / negotiation

2016-01-21 Thread Gabriel Wicke
Hi,

we are considering a policy for REST API end point result format
versioning and negotiation. The background and considerations are
spelled out in a task and mw.org page:

https://phabricator.wikimedia.org/T124365
https://www.mediawiki.org/wiki/Talk:API_versioning

Based on the discussion so far, have come up with the following
candidate solution:

1) Clearly advise clients to explicitly request the expected mime type
with an Accept header. Support older mime types (with on-the-fly
transformations) until usage has fallen below a very low percentage,
with an explicit sunset announcement.

2) Always return the latest content type if no explicit Accept header
was specified.

We are interested in hearing your thoughts on this.

Once we have reached rough consensus on the way forward, we intend to
apply the newly minted policy to an evolution of the Parsoid HTML
format, which will move the data-mw attribute to a separate metadata
blob.

Gabriel Wicke

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] RESTBase 0.9.2 (security release)

2016-01-20 Thread Gabriel Wicke
A vulnerability has been found in RESTBase v0.9.1 and earlier that
allowed attackers to read arbitrary files on the host system by
passing a specially crafted URL. This vulnerability has been fixed in
[1].

All RESTBase users are strongly encouraged to upgrade to v0.9.2
immediately. Files readable by the RESTBase service user might have
been accessed by third parties, so appropriate measures should be
taken.

mediawiki-containers [2] users with automatic updates enabled have
already been upgraded to v0.9.2.

-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation

[1]: 
https://github.com/wikimedia/restbase/commit/1ea649306ae4e85ab2cee5a36318e990a4fca3f5
[2]: https://github.com/wikimedia/mediawiki-containers

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Randy Shoup on architecture experiences from Google and Ebay

2016-01-03 Thread Gabriel Wicke
Many of you have already seen this when it came out, but I think it's worth
re-posting Randy Shoup's summary of architecture experiences from his time
at Google and Ebay:

http://www.infoq.com/presentations/service-arch-scale-google-ebay

For those interested in a quick summary of his key points, Todd Hoff has
one at
http://highscalability.com/blog/2015/12/1/deep-lessons-from-google-and-ebay-on-building-ecosystems-of.html
.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Announcing mediawiki-containers, a Docker-based MediaWiki installer

2015-12-24 Thread Gabriel Wicke
Hi Riccardo,

On Thu, Dec 24, 2015 at 4:02 PM, Riccardo Iaconelli  wrote:
> We have a full Mediawiki stack, including OCG, memcached and Parsoid, and it
> has been working like a breeze on many platforms.

I'm glad to hear that Docker has worked well for you. I wasn't aware
of your images, so thanks for the pointers!

> How can we cooperate?

The focus of mediawiki-containers is currently primarily on small
installs with limited hardware resources, but I think that we can make
the core flexible enough to support an ecosystem of optional features
and services like OCG.

Your input on what you need out of the base mediawiki image would be
much appreciated. I think mediawiki-docker [1] already covers much of
what you are doing in https://github.com/WikiToLearn/WebSrv, but I
also see some bits like X-Forwarded-For handling that are still
missing. What is your view on switching to HHVM?

Another obvious opportunity is to integrate your work on OCG as an
optional feature in the installer. The multi-image setup in
mediawiki-containers should make this relatively straightforward.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Announcing mediawiki-containers, a Docker-based MediaWiki installer

2015-12-24 Thread Gabriel Wicke
I am writing to announce mediawiki-containers [1], a simple installer for
MediaWiki with VisualEditor, Parsoid, RESTBase and other services, using
Linux containers.

The main goal of this project is to make it really easy to set up and
maintain a fully featured MediaWiki system on a wide range of platforms.
The project is in an early stage, but already supports full installation on
Ubuntu, Debian and other systemd-based distributions, as well as starting
containers on OS X via the Docker toolbox [6].

These are the basic steps involved in setting up your own MediaWiki
instance with VisualEditor:

1) Get a Linux VM in labs or from a hosting provider, and select Debian
(Jessie or newer) or Ubuntu 15.04+ as the distribution. Commercial VMs with
reasonable specifications cost about $5 per month [2].
2) Log into your VM, and run this command [7]:
curl
https://raw.githubusercontent.com/wikimedia/mediawiki-containers/master/mediawiki-containers
| sudo bash
3) Answer the questions in the installer.

Here is a screencast of an installer run, illustrating steps 2) and 3):
  https://people.wikimedia.org/~gwicke/mediawiki-containers-install.ogv

Under the hood, mediawiki-containers uses several Docker containers:
- wikimedia/mediawiki [3] with MediaWiki 1.27-wmf9 and VisualEditor.
- wikimedia/mediawiki-node-services [4] with Parsoid and RESTBase running
in a single process to minimize memory use.
- MariaDB as the database backend [5].

Data and configurations are stored on the host system in
/srv/mediawiki-containers/data, which means that upgrading is as simple as
fetching the latest container images by re-running the installer.
Optionally, the installer can set up automated nightly updates, which helps
to keep your wiki installation up to date.

The project is brand new, so there is a fair chance that you will encounter
bugs. Please report issues at
https://phabricator.wikimedia.org/maniphest/task/create/?projects=mediawiki-containers
.

Here are some ideas we have for the next steps:

- Forward `/api/rest_v1/` to RESTBase & configure RESTBase updates. Enable
Wikitext / HTML switching in VE.
- Improve security:
- Run each container under a different, unprivileged user.
- Secure the install / update process with signatures.
- Add popular extensions, and streamline the support for custom extensions.
- Add services like mathoid, graphoid.
- Use the HHVM PHP runtime instead of Zend, possibly using ideas from
https://github.com/kasperisager/php-dockerized.
- Support developer use cases:
- Optionally mount code volumes from the host system.
- Improve configuration customization support.
- Support for more distributions.

Let us know what you think & what you would like to see next at
https://phabricator.wikimedia.org/T92826.

Happy holidays,

Gabriel Wicke and the Services team


[1]: https://github.com/wikimedia/mediawiki-containers
[2]:
http://serverbear.com/compare?Sort=BearScore=desc+Type=VPS+Cost=-=-=5-=-=KVM
[3]: https://github.com/wikimedia/mediawiki-docker and
https://hub.docker.com/r/wikimedia/mediawiki
[4]: https://github.com/wikimedia/mediawiki-node-services and
https://hub.docker.com/r/wikimedia/mediawiki-node-services/
[5]: https://hub.docker.com/_/mariadb/
[6]: https://docs.docker.com/mac/step_one/
[7]: We agree that `curl | bash` has its risks, but it is hard to beat for
simplicity. The Chef project has a good discussion of pros, cons &
alternatives at
https://www.chef.io/blog/2015/07/16/5-ways-to-deal-with-the-install-sh-curl-pipe-bash-problem/
.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Peer-to-peer sharing of the content of Wikipedia through WebRTC

2015-11-30 Thread Gabriel Wicke
On Mon, Nov 30, 2015 at 4:02 PM, Bryan Davis  wrote:
> On Mon, Nov 30, 2015 at 4:03 PM, Brian Wolff  wrote:
>>
>> If we wanted to address such a situation, it sounds like it would be
>> less complex to just setup a varnish box (With access to the HTCP
>> cache clear packets), on that campus.
>
> This is an idea I've casually thought about but never put any real
> work towards. It would be pretty neat to have something similar to the
> Netflix Open Connect appliance [0] available for Wikimedia projects.

This has a very strong "back to the future" ring to it. We started our
caching layers back in 2004 with an eye towards bandwidth / housing
donations from universities and ISPs [1], and indeed benefited from
such donations in Amsterdam (Kennisnet), Paris and Seoul (Yahoo). Of
these, only the Amsterdam PoP has survived, and is now in our own
management.

Gabriel

[1]: 
http://web.archive.org/web/20040710213535/http://www.aulinx.de/oss/code/wikipedia/

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parsoid still doesn't love me

2015-11-06 Thread Gabriel Wicke
We don't currently store the full history of each page in RESTBase, so your
first access will trigger an on-demand parse of older revisions not yet in
storage, which is relatively slow. Repeat accesses will load those
revisions from disk (SSD), which will be a lot faster.

With a majority of clients now supporting HTTP2 / SPDY, use cases that
benefit from manual batching are becoming relatively rare. For a use case
like revision retrieval, HTTP2 with a decent amount of parallelism should
be plenty fast.

Gabriel

On Fri, Nov 6, 2015 at 2:24 PM, C. Scott Ananian <canan...@wikimedia.org>
wrote:

> I think your subject line should have been "RESTBase doesn't love me"?
>  --scott
> ​
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parsoid convert arbitrary HTML?

2015-11-06 Thread Gabriel Wicke
To add to what Eric & Subbu have said, here is a link to the API
documentation for this end point:

https://en.wikipedia.org/api/rest_v1/?doc#!/Transforms/post_transform_html_to_wikitext_title_revision

On Fri, Nov 6, 2015 at 8:47 AM, Subramanya Sastry <ssas...@wikimedia.org>
wrote:

> On 11/06/2015 10:18 AM, James Montalvo wrote:
>
>> Can Parsoid be used to convert arbitrary HTML to wikitext? It's not clear
>> to me whether it will only work with Parsoid's HTML+RDFa. I'm wondering if
>> I could take snippets of HTML from non-MediaWiki webpages and convert them
>> into wikitext.
>>
>
> The right answer is: "It depends" :-)
>
> As Eric responded in his reply, Parsoid does convert some kinds of
> arbitrary HTML to clean wikitext. See some additional examples at the end
> of this email.
>
> However, if you really threw arbitrary HTML at it (ex: .. or
> ..) Parsoid wouldn't know that it could potentially use ''
> or ''' for those tags. Or, if you gave it input with all kinds of css and
> other inlined attributes, you won't necessarily get the best wikitext from
> it.
>
> But, if you tried to convert HTML that you got from say Google docs, Open
> Office, Word, or other HTML-generation tools, the wikitext you get may not
> be very pretty.
>
> We do want to keep improving Parsoid's abilities to get there, but it has
> not been a high priority for us, but it would be a great GSoC or volunteer
> project if someone wants to play with this and improve this feature given
> that we are always playing catch up with all the other things we need to
> get done.
>
> But, if you didn't have really arbitrary HTML, you can get some reasonable
> looking wikitext out of it even without the markers. But, things like
> images, templates, extensions .. obviously require the additional
> attributes for Parsoid to generate canonical wikitext for that.
>
> Hope this helps.
>
> Subbu.
>
>
> ---
>
> Some html -> wt examples:
>
> [subbu@earth bin] echo "fooab" | node parse
> --html2wt
> == foo ==
> a
>
> b
> [subbu@earth bin] echo " href='http://en.wikipedia.org/wiki/Hampi'>Hampi"
> | node parse --html2wt
> [[Hampi]]
>
> [subbu@earth bin] echo "Luna"
> | node parse --html2wt
> [[:it:Luna|Luna]]
>
> [subbu@earth bin] echo "Luna"
> | node parse --html2wt --prefix itwiki
> [[Luna]]
>
> [subbu@earth bin] echo "abc" | node
> parse --html2wt
> * a
> * b
> * c
>
> [subbu@earth bin] echo foo" | node parse --html2wt
> foo
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Inhibitors for Mobile Content Service to use Parsoid output

2015-10-16 Thread Gabriel Wicke
On Fri, Oct 16, 2015 at 2:50 PM, Brian Gerstle <bgers...@wikimedia.org>
wrote:

> I've mentioned this idea before, but having a service which allowed you to
> reliably get image thumbs for a given file at a specified width/height
> would obviate the srcset.



Our thumbs are already created on demand, based on the image width
specified in the URL. Example for a 40px wide thumb:

https://upload.wikimedia.org/wikipedia/commons/thumb/d/d9/Collage_of_Nine_Dogs.jpg/40px-Collage_of_Nine_Dogs.jpg


The corresponding Parsoid HTML contains the original height & width in data
attributes:



Based on this information, it shouldn't be too hard to calculate 1.5x / 2x
resolution thumb urls with a combination of multiplication & rounding.


> And prevent cache fragmentation on img resolutions.
>

Isn't the srcset using a limited set of resolution factors?


>
> On Friday, October 16, 2015, Dmitry Brant <dbr...@wikimedia.org> wrote:
>
> > We can indeed fall back to TTS if the spoken article is not available, or
> > offer a choice between TTS and the spoken version. The intention was for
> > this to be a quick win of surfacing a useful, if lesser-known, facet of
> > Wikipedia content.
> >
> > That being said, this doesn't necessarily need to be a blocker for
> > transitioning the Content Service to Parsoid. If all else fails, we can
> > ascertain the audio URL on the client side based on the File page name.
> As
> > for transcodings of video files, we already make a separate API call to
> > retrieve them, so perhaps we can continue to do that until we're able to
> > get them directly from Parsoid?
> > It sounds like a more pressing issue right now is the srcset
> attributes...
> >
> >
> > On Fri, Oct 16, 2015 at 2:30 PM, Luis Villa <lvi...@wikimedia.org
> > <javascript:;>> wrote:
> >
> > > On Fri, Oct 16, 2015 at 11:14 AM, Bernd Sitzmann <be...@wikimedia.org
> > <javascript:;>>
> > > wrote:
> > >
> > > > It looks like Mobile Apps and Mobile Web have different priority
> > > > > requirements from Parsoid here. Looking at
> > > > > https://en.wikipedia.org/wiki/Wikipedia:Spoken_articles, I also
> see
> > > that
> > > > > there are only 1243 spoken wikipedia articles (that are probably
> not
> > > all
> > > > > the latest version of these articles). It also doesn't look like
> the
> > > > video
> > > > > player works currently in mobile web or in mobile apps (except
> maybe
> > > > > Android ?).
> > > >
> > >
> > > With due respect for the hard work people have put in on that project,
> is
> > > there any indication Spoken Articles has any traction and will grow
> > beyond
> > > that ~1K articles? Wouldn't using Android's TTS API to read the most
> > > up-to-date version of the article be a much better user experience (35M
> > > articles, always up-to-date, instead of 1K articles, almost always out
> of
> > > date?)
> > >
> > > Luis
> > >
> > >
> > > --
> > > Luis Villa
> > > Sr. Director of Community Engagement
> > > Wikimedia Foundation
> > > *Working towards a world in which every single human being can freely
> > share
> > > in the sum of all knowledge.*
> > > ___
> > > Wikitech-l mailing list
> > > Wikitech-l@lists.wikimedia.org <javascript:;>
> > > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> > >
> >
> >
> >
> > --
> > Dmitry Brant
> > Mobile Apps Team (Android)
> > Wikimedia Foundation
> > https://www.mediawiki.org/wiki/Wikimedia_mobile_engineering
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org <javascript:;>
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> --
> EN Wikipedia user page: https://en.wikipedia.org/wiki/User:Brian.gerstle
> IRC: bgerstle
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RFC: Optional Travis integration for Jenkins

2015-10-02 Thread Gabriel Wicke
On Fri, Oct 2, 2015 at 6:15 AM, Marko Obrovac 
wrote:
>
>
> While hosted officially on Gerrit, Citoid should be added to this list as
> well. Its proper functioning depends on Zotero being available, so the
> current CI tests for Citoid include only syntax checking via jshint. In
> this case, however, it is unlikely that Travis would help. Ideally,
> isolated CI instances would allow us to have a Zotero container that could
> be spun alongside Citoid during tests.
>

Travis provides isolated containers or VMs, so should be able to test
Zotero if there is a version of Zotero that is compatible with the Travis
default image.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] REST v1 API: Replacing bodyOnly flag in wikitext to HTML transform end point with body_only

2015-10-02 Thread Gabriel Wicke
Hello,

in an effort to standardize all post parameters in the REST v1 API to use
snake_case names, we have deprecated the bodyOnly flag in the wikitext to
HTML transform end point [1]. Instead, clients should use body_only, as
mentioned in the documentation.

We plan to remove support for the old form by the end of November.

Thanks,

Gabriel

[1]:
https://en.wikipedia.org/api/rest_v1/?doc#!/Transforms/post_transform_wikitext_to_html_title_revision
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] thumb generation

2015-09-17 Thread Gabriel Wicke
On Wed, Sep 16, 2015 at 12:51 AM, Federico Leva (Nemo) <nemow...@gmail.com>
wrote:

> Have you looked into what mwoffliner does?
> https://sourceforge.net/p/kiwix/other/ci/master/tree/mwoffliner/mwoffliner.js


+1 for mwoffliner. It should be *very* close to what you are looking for,
and avoids the need to parse wikitext itself by fetching the HTML from the
Wikimedia REST API.


>
> Maybe you can even just extract the images from the ZIM files.
>
> Nemo
>
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] QA: Holding our code to better standards.

2015-09-03 Thread Gabriel Wicke
In the services team, we found that prominent coverage metrics are a very
powerful motivator for keeping tests in order. We have set up 'voting'
coverage reports, which fail the overall tests if coverage falls, and make
it easy to check which lines aren't covered yet (via coveralls). In all
repositories we enabled this for, test coverage has since stabilized around
80-90%.

Gabriel

On Thu, Sep 3, 2015 at 4:31 PM, Steven Walling <steven.wall...@gmail.com>
wrote:

> Just to hop on the bandwagon here: this seems like the only sane path going
> forward. One unmentioned benefit is that this is a step toward continuous
> deployment. Having integration tests run on every commit and then block
> when there are failures is pretty much a requirement if Wikimedia ever
> wants to get there.
>
> On Thu, Sep 3, 2015 at 1:43 PM Pine W <wiki.p...@gmail.com> wrote:
>
> > I just want to say that I appreciate this overview.
> >
> > Pine
> > ___
> > Wikitech-l mailing list
> > Wikitech-l@lists.wikimedia.org
> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] What happened to our user agent requirements?

2015-09-01 Thread Gabriel Wicke
On Tue, Sep 1, 2015 at 5:54 PM, Gergo Tisza  wrote:
>
>
> Rate limiting / UA policy enforcement has to be done in Varnish, since API
> responses can be cached there and so the requests don't necessarily reach
> higher layers (and we wouldn't want to vary on user agent).



The cost / benefit trade-offs for Varnish cache hits are fairly different
from those of cache misses. Especially for in-memory (frontend) hits it
might overall be cheaper to send a regular response, rather than adding
rate limit overheads to each cache hit.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] What happened to our user agent requirements?

2015-09-01 Thread Gabriel Wicke
We recently revisited rate limiting in
https://phabricator.wikimedia.org/T107934, but came to similar conclusions
as reached in this thread:


   - Limits for weak identifiers like IPs or user agents would (at least
   initially) need to be high enough to render the limiting borderline useless
   against DDOS attacks.
   - Stronger authentication requirements have significant costs to users,
   and will require non-trivial backend work to keep things efficient on our
   end. I believe we should tackle this backend work in any case, but it will
   take some time.
   - In our benchmarks, most off-the-shelf rate limiting libraries use
   per-request network requests to a central service like Redis, which costs
   latency and throughput, and has some scaling challenges. There are
   algorithms [1] that trade some precision for performance, but we aren't
   aware of any open source implementations we could use.

The dual of rate limiting is making each API request cheaper. We have
recently made some progress towards limiting the cost of individual API
requests, and are working towards making most API end points cacheable &
backed by storage.

Gabriel

[1]:
http://yahooeng.tumblr.com/post/111288877956/cloud-bouncer-distributed-rate-limiting-at-yahoo


On Tue, Sep 1, 2015 at 4:54 PM, Brandon Black <bbl...@wikimedia.org> wrote:

> On Tue, Sep 1, 2015 at 10:42 PM, Platonides <platoni...@gmail.com> wrote:
> > Brad Jorsch (Anomie) wrote:
> >> I wonder if it got lost in the move from Squid to Varnish, or something
> >> along those lines.
> > That's likely, given that it was enforced by squid.
>
> We could easily add it back in Varnish, too, but I tend to agree with
> Brion's points that it's not ultimately helpful.
>
> I really do like the idea of moving towards smarter ratelimiting of
> APIs by default, though (and have brought this up in several contexts
> recently, but I'm not really aware of whatever past work we've done in
> that direction).  From that relatively-ignorant perspective, I tend to
> envision an architecture where the front edge ratelimits API requests
> (or even possibly, all requests, but we'd probably have to exclude a
> lot of common spiders...) via a simple token-bucket-filter if they're
> anonymous, but lets them run free if they superficially appear to have
> a legitimate cookie or API access token.  Then it's up to the app
> layer to enforce limits for the seemingly-identifiable traffic and be
> configurable to raise them for legitimate remote clients we've had
> contact with, and to reject legitimate-looking tokens/logins that the
> edge choses not to ratelimit which aren't actually legitimate.
>
> -- Brandon
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>



-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Content WG: Templating, Page Components editing

2015-08-13 Thread Gabriel Wicke
Etherpad:
https://etherpad.wikimedia.org/p/Templates,_Page_Components_and_editing

On Wed, Aug 12, 2015 at 8:36 AM, Bryan Davis bd...@wikimedia.org wrote:

 On Tue, Aug 11, 2015 at 6:12 PM, Gabriel Wicke gwi...@wikimedia.org
 wrote:
  TL;DR: Join us to discuss Templates, Page Components  editing on Thu, 13
  August, 12:45 – 14:00 PDT [0].
 
  Please join us at:
 
  Thu, 13 August, 12:45 – 14:00 PDT [0]
 
  by joining the BlueJeans conference call [1],
  on IRC, in #wikimedia-meeting, or
  in Room 37 in the WMF office.
 
  [1]: https://bluejeans.com/2061103652, via phone +14087407256, meeting
 id

 To access the conference without installing the BlueJeans browser
 plugin, join using the open WebRTC protocol [2] via the URL
 https://bluejeans.com/2061103652/webrtc.

 [2]: http://www.webrtc.org/

 Bryan
 --
 Bryan Davis  Wikimedia Foundationbd...@wikimedia.org
 [[m:User:BDavis_(WMF)]]  Sr Software EngineerBoise, ID USA
 irc: bd808v:415.839.6885 x6855




-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Mediawiki-api] Bikeshedding a good name for the api.php API

2015-08-11 Thread Gabriel Wicke
Last night I tweaked the listing at https://en.wikipedia.org/api/ to read:

* Action API, providing rich queries, editing and content access.
* REST API v1, mainly focused on high-volume content access.

The PHP prefix seemed to confuse some, thinking that it was a
PHP-specific API.

Another suggestion was to call it MediaWiki Action API, in the hope of
getting better name recognition. However, both APIs have a claim to be
MediaWiki APIs in the wider sense, so this distinction might only be
meaningful in the short term.

Gabriel

On Tue, Aug 11, 2015 at 6:24 AM, Brad Jorsch (Anomie) bjor...@wikimedia.org
 wrote:

 On Mon, Aug 10, 2015 at 7:13 PM, Ricordisamoa 
 ricordisa...@openmailbox.org
 wrote:

  Since the current REST API is available under v1, my take is the v0
  API :-)


 That name sucks because it implies that the REST API is supposed to replace
 it.

 --
 Brad Jorsch (Anomie)
 Senior Software Engineer
 Wikimedia Foundation
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RFC: Replace Tidy with HTML 5 parse/reserialize

2015-08-11 Thread Gabriel Wicke
On Tue, Aug 11, 2015 at 5:16 PM, Trevor Parscal tpars...@wikimedia.org
wrote:

 Is it possible use part of the Parsoid code to do this?


It is possible to do this in Parsoid (or any node service) with this line:

 var sanerHTML = domino.createDocument(input).outerHTML;

However, performance is about 2x worse than current tidy (116ms vs. 238ms
for Obama), and about 4x slower than the fastest option in our tests. The
task has a lot more benchmarks of various options.

Gabriel






 - Trevor

 On Tuesday, August 11, 2015, Tim Starling tstarl...@wikimedia.org wrote:

  I'm elevating this task of mine to RFC status:
 
  https://phabricator.wikimedia.org/T89331
 
  Running the output of the MediaWiki parser through HTML Tidy always
  seemed like a nasty hack. The effects on wikitext syntax are arbitrary
  and change from version to version. When we upgrade our Linux
  distribution, we sometimes see changes in the HTML generated by given
  wikitext, which is not ideal.
 
  Parsoid took a different approach. After token-level transformations,
  tokens are fed into the HTML 5 parse algorithm, a complex but
  well-specified algorithm which generates a DOM tree from quirky input
  text.
 
  http://www.w3.org/TR/html5/syntax.html
 
  We can get nearly the same effect in MediaWiki by replacing the Tidy
  transformation stage with an HTML 5 parse followed by serialization of
  the DOM back to HTML. This would stabilize wikitext syntax and resolve
  several important syntax differences compared to Parsoid.
 
  However:
 
  * I have not been able to find any PHP implementation of this
  algorithm. Masterminds and Ressio do not even attempt it. Electrolinux
  attempts it but does not implement the error recovery parts that are
  of interest to us.
  * Writing our own would be difficult.
  * Even if we did write it, it would probably be too slow.
 
  So the question is: what language should we use? Since this is the
  standard programmer troll question, please bring popcorn.
 
  The best implementation of this algorithm is in Java: the validator.nu
  parser is maintained by Mozilla, and has source translation to C++,
  which is used by Mozilla and could potentially be used for an HHVM
  extension.
 
  There is also a Rust port (also written by Mozilla), and notable
  implementations in JavaScript and Python.
 
  For WMF, a Java service would be quite easily done, and I have
  prototyped it already. An HHVM extension might also be possible. A
  non-service fallback for small installations might be Node.js or a
  compiled binary from Rust or C++.
 
  -- Tim Starling
 
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org javascript:;
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Content WG: Templating, Page Components editing

2015-08-11 Thread Gabriel Wicke
TL;DR: Join us to discuss Templates, Page Components  editing on Thu, 13
August, 12:45 – 14:00 PDT [0].


Hello all,

Recent discussions, including the pre-Wikimania content brainstorming
[2][3], brought up several important questions about the next steps for
MediaWiki's and particularly Wikimedia's content representation, storage,
change propagation, and caching. Many of those questions directly affect
ongoing work, so it would be good to get more clarity on them soon. To this
end, I am proposing we meet every two weeks  discuss one major area at a
time. I think we have enough topics for four meetings over two months
[2][3], after which we can re-evaluate the approach.

As the first topic, I would like to propose *Templates, Page Components 
editing*. Gradual improvements in this area should let us broaden the
support for different devices, improve the editing experience, and speed up
rendering and updates. There has been a lot of discussion and activity on
this recently, including a talk by C.Scott at Wikimania [4], Jon's
Wikidata-driven infoboxes on the mobile site [5], Marius's Lua-based
infobox programming idea [6], and Wikia's declarative infobox components
[7]. This summary task [8] has a list of related resources.

Concretely, we could try to answer these questions:

   - Can we find satisfactory general abstractions for page components
   (well-formed content blocks)?
   - What are the requirements for editing, RL module / metadata
   aggregation, dependency tracking?
   - Should we evolve wikitext templates into well-formed page components?


Please join us at:

*Thu, 13 August, 12:45 – 14:00 PDT* [0]

   - by joining the BlueJeans conference call [1],
   - on IRC, in #wikimedia-meeting, or
   - in Room 37 in the WMF office.

See you there,

Gabriel

[0]: http://www.timeanddate.com/worldclock/fixedtime.html?iso=20150813T1945
[1]: https://bluejeans.com/2061103652, via phone +14087407256, meeting id
2061103652
[2]: https://phabricator.wikimedia.org/T99088
[3]: https://etherpad.wikimedia.org/p/Content_platform
[4]:
https://upload.wikimedia.org/wikipedia/commons/0/08/Templates_are_dead!_Long_live_templates!.pdf
[5]: https://en.m.wikipedia.org/wiki/Albert_Einstein?mobileaction=alpha
[6]: https://www.mediawiki.org/wiki/Extension:Capiunto
[7]: http://community.wikia.com/wiki/Thread:841717,
http://infoboxpreview.appspot.com/
[8]: https://phabricator.wikimedia.org/T105845
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RFC: Replace Tidy with HTML 5 parse/reserialize

2015-08-11 Thread Gabriel Wicke
On Tue, Aug 11, 2015 at 5:24 PM, Trevor Parscal tpars...@wikimedia.org
wrote:

 Interesting. What is the cause of the slower speed?


Mainly a pure-JS DOM implementation (domino) not being quite the same speed
as C or Rust with all optimizations turned on. The deltas are roughly in
line with language benchmarks like http://benchmarksgame.alioth.debian.org/.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] I love Parsoid but it doesn't want me

2015-07-24 Thread Gabriel Wicke
On Fri, Jul 24, 2015 at 10:58 AM, Ricordisamoa ricordisa...@openmailbox.org
 wrote:


 RESTBase could help you there. With one API call, you can get the (stored)
 latest HTML revision of a page in Parsoid format~[1], but without the need
 to wait for Parsoid to parse it (if the latest revision is in RESTBase's
 storage).


 What if it isn't?



If it is not in storage, then it will be generated transparently. This
should only sometimes happen when you request a revision less than a
handful of seconds after it was saved.


  There is also section API support (you can get individual HTML
 fragments of a page by ID, and send only those back for transformation
 into
 wikitext~[2]). There is also support for page editing (aka saving), but
 these endpoints have not yet been enabled for WMF wikis in production due
 to security concerns.


 Then I guess HTML would have to be converted into wikitext before saving?
 +1 API call


As Marko mentioned, the HTML save end point is not yet enabled in
production. Once it is, you will be able to directly POST modified HTML to
save it, without adding a VisualEditor tag or having to perform extra API
requests.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Upcoming leap second on Tue 30th

2015-06-25 Thread Gabriel Wicke
Alex and Moritz,

thank you for taking care of this. These leap seconds are a real pain in
the butt for time-based distributed systems, and I'm glad that we have a
plan in place. I hope the movement to abolish leap seconds
https://en.wikipedia.org/wiki/Leap_second#Proposal_to_abolish_leap_seconds
wins out in the end!

On Thu, Jun 25, 2015 at 7:27 AM, Moritz Mühlenhoff mor...@wikimedia.org
wrote:

 * On the 1st of July we'll re-enable NTP in batches. System clocks will
 move forward by a second once NTP is started again,


To clarify: By default, system time will move *backwards* one second.

We just talked about this on IRC, so just for other's benefit: With NTP's
-x option we should be able to smear the adjustment (by slowing down the
system clock temporarily) until the leap second is incorporated into the
system time. This avoids non-monotonicity, which is important for systems
that use time to capture causality. It would be great to apply the
adjustment to all nodes of the cassandra cluster at once, so that their
clocks are being slewed in lock-step.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Modernizing our content platform: Kick-off meeting on Tuesday

2015-06-23 Thread Gabriel Wicke
Reminder: This is today!

When: *Tuesday, June 23rd, 13:00 - 14:30 PT* [3]
Where:
* *https://plus.google.com/hangouts/_/wikimedia.org/contentplatform
https://plus.google.com/hangouts/_/wikimedia.org/contentplatform*
* *room 37* in the office

On Fri, Jun 19, 2015 at 12:00 PM, Gabriel Wicke gwi...@wikimedia.org
wrote:

 Hi all,

 a few of us have recently collected and roughly prioritized some open
 architectural questions [1]. The area that stood out as needing most urgent
 attention is adapting our content platform to long-term changes in the way
 users interact with our site [2]. People are using a wider range of
 devices, from feature phones to multi-core desktops. Many users are looking
 for short factoids and definitions, while others prefer to immerse
 themselves in detailed articles with rich multimedia content.

 MediaWiki is currently not very optimized to support such a diverse set of
 use cases. To address this, we see a need to improve our platform in the
 following areas:


- Storage: To better separate data from presentation, we need the
ability to store multiple bits of content and metadata associated with each
revision. This storage needs to integrate well with edits, history views,
and other features, and should be exposed via a high-performance API.
- Change propagation: Edits to small bits of data need to be reliably
and efficiently propagated to all content depending on it. The machinery
needed to track dependencies should be easy to use.
- Content composition and caching: Separate data gives us the freedom
to render infoboxes, graphs or multimedia elements dynamically, depending
on use case and client. For performance and flexibility, it would be
desirable to assemble at least some of these renders as late as possible,
at the edge or on the client.


 We don't expect to tackle all of this at once, but are starting to look
 into several areas. If you are interested in helping, then we would like to
 invite you to join us for a kick-off meeting:

 *When: Tuesday, June 23rd, 13:00 - 14:30 PT [3]*
 *Where: *A *hangout* link will be posted here before the meeting; room 37
 in the office.

 If you can't attend, then please have a look at our current notes and let
 us know what you think [2].

 Gabriel Wicke, Daniel Kinzler, Brion Vibber, Tim Starling, Roan Kattouw,
 Ori Livneh


 [1]: https://phabricator.wikimedia.org/T96903
 [2]: https://phabricator.wikimedia.org/T99088
 [3]:
 http://www.timeanddate.com/worldclock/fixedtime.html?msg=MediaWiki+content+platform+kick-offiso=20150623T13p1=224ah=1am=30




-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Modernizing our content platform: Kick-off meeting on Tuesday

2015-06-23 Thread Gabriel Wicke
Vibha,

sorry I missed your reply before the meeting.

The discussion was mostly technical an centered around two main areas:

- content storage and dependency tracking / change propagation
- page content representation, content composition and editing

We collected and discussed a list of use cases and their respective
challenges on an etherpad [1]. In the end, we resolved to follow up with
more focused work around the two main themes. I'll summarize the discussion
in a task per area and post them here.

Gabriel

[1]: http://etherpad.wikimedia.org/p/Content_platform

On Tue, Jun 23, 2015 at 1:16 PM, Adam Baso ab...@wikimedia.org wrote:

 Updated URLs, we're in R37

 on air stream: http://youtu.be/RcE2kecrsIk
 Max of 15 users:
 https://plus.google.com/hangouts/_/hoaevent/AP36tYdub-Rs4mI_4UjTEzTgU7GKBkgjV5s0kXASoA9Tno4gJK34_Q

 On Tue, Jun 23, 2015 at 12:18 PM, Vibha Bamba vba...@wikimedia.org
 wrote:

 This sounds fairly dev centric with Front end/ UX implications.
 Will the discussion be fairly technical? Let us know if Design should
 attend.

 
 Vibha Bamba
 Senior Designer | WMF Design







 On Tue, Jun 23, 2015 at 11:12 AM, Gabriel Wicke gwi...@wikimedia.org
 wrote:

 Reminder: This is today!

 When: *Tuesday, June 23rd, 13:00 - 14:30 PT* [3]
 Where:
 * *https://plus.google.com/hangouts/_/wikimedia.org/contentplatform
 https://plus.google.com/hangouts/_/wikimedia.org/contentplatform*
 * *room 37* in the office

 On Fri, Jun 19, 2015 at 12:00 PM, Gabriel Wicke gwi...@wikimedia.org
 wrote:

 Hi all,

 a few of us have recently collected and roughly prioritized some open
 architectural questions [1]. The area that stood out as needing most urgent
 attention is adapting our content platform to long-term changes in the way
 users interact with our site [2]. People are using a wider range of
 devices, from feature phones to multi-core desktops. Many users are looking
 for short factoids and definitions, while others prefer to immerse
 themselves in detailed articles with rich multimedia content.

 MediaWiki is currently not very optimized to support such a diverse set
 of use cases. To address this, we see a need to improve our platform in the
 following areas:


- Storage: To better separate data from presentation, we need the
ability to store multiple bits of content and metadata associated with 
 each
revision. This storage needs to integrate well with edits, history 
 views,
and other features, and should be exposed via a high-performance API.
- Change propagation: Edits to small bits of data need to be
reliably and efficiently propagated to all content depending on it. The
machinery needed to track dependencies should be easy to use.
- Content composition and caching: Separate data gives us the
freedom to render infoboxes, graphs or multimedia elements dynamically,
depending on use case and client. For performance and flexibility, it 
 would
be desirable to assemble at least some of these renders as late as
possible, at the edge or on the client.


 We don't expect to tackle all of this at once, but are starting to look
 into several areas. If you are interested in helping, then we would like to
 invite you to join us for a kick-off meeting:

 *When: Tuesday, June 23rd, 13:00 - 14:30 PT [3]*
 *Where: *A *hangout* link will be posted here before the meeting; room
 37 in the office.

 If you can't attend, then please have a look at our current notes and
 let us know what you think [2].

 Gabriel Wicke, Daniel Kinzler, Brion Vibber, Tim Starling, Roan
 Kattouw, Ori Livneh


 [1]: https://phabricator.wikimedia.org/T96903
 [2]: https://phabricator.wikimedia.org/T99088
 [3]:
 http://www.timeanddate.com/worldclock/fixedtime.html?msg=MediaWiki+content+platform+kick-offiso=20150623T13p1=224ah=1am=30




 --
 Gabriel Wicke
 Principal Engineer, Wikimedia Foundation

 ___
 Engineering mailing list
 engineer...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/engineering



 ___
 Engineering mailing list
 engineer...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/engineering





-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Modernizing our content platform: Kick-off meeting on Tuesday

2015-06-19 Thread Gabriel Wicke
Hi all,

a few of us have recently collected and roughly prioritized some open
architectural questions [1]. The area that stood out as needing most urgent
attention is adapting our content platform to long-term changes in the way
users interact with our site [2]. People are using a wider range of
devices, from feature phones to multi-core desktops. Many users are looking
for short factoids and definitions, while others prefer to immerse
themselves in detailed articles with rich multimedia content.

MediaWiki is currently not very optimized to support such a diverse set of
use cases. To address this, we see a need to improve our platform in the
following areas:


   - Storage: To better separate data from presentation, we need the
   ability to store multiple bits of content and metadata associated with each
   revision. This storage needs to integrate well with edits, history views,
   and other features, and should be exposed via a high-performance API.
   - Change propagation: Edits to small bits of data need to be reliably
   and efficiently propagated to all content depending on it. The machinery
   needed to track dependencies should be easy to use.
   - Content composition and caching: Separate data gives us the freedom to
   render infoboxes, graphs or multimedia elements dynamically, depending on
   use case and client. For performance and flexibility, it would be desirable
   to assemble at least some of these renders as late as possible, at the edge
   or on the client.


We don't expect to tackle all of this at once, but are starting to look
into several areas. If you are interested in helping, then we would like to
invite you to join us for a kick-off meeting:

*When: Tuesday, June 23rd, 13:00 - 14:30 PT [3]*
*Where: *A *hangout* link will be posted here before the meeting; room 37
in the office.

If you can't attend, then please have a look at our current notes and let
us know what you think [2].

Gabriel Wicke, Daniel Kinzler, Brion Vibber, Tim Starling, Roan Kattouw,
Ori Livneh


[1]: https://phabricator.wikimedia.org/T96903
[2]: https://phabricator.wikimedia.org/T99088
[3]:
http://www.timeanddate.com/worldclock/fixedtime.html?msg=MediaWiki+content+platform+kick-offiso=20150623T13p1=224ah=1am=30
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wmfall] Welcome Darian Patrick

2015-05-19 Thread Gabriel Wicke
Welcome, Darian. Glad to have you on board!

On Tue, May 19, 2015 at 3:38 PM, Bahodir Mansurov bmansu...@wikimedia.org
wrote:

 Welcome, Darian!

  On May 19, 2015, at 6:31 PM, Pine W wiki.p...@gmail.com wrote:
 
  Hi Darian,
 
  Welcome to Wikimedia. I'd like to invite you to subscribe to the Cascadia
  Wikimedians email list; we're the regional affiliate for Oregon,
 Washington
  State, and British Columbia. Signup is at
  https://lists.wikimedia.org/mailman/listinfo/wikimedia-cascadia 
 https://lists.wikimedia.org/mailman/listinfo/wikimedia-cascadia
 
  I'm cc'ing my colleague Another Believer who is in Portland. I hope that
  the two of you can find time to meet.
 
  Cheers,
  https://lists.wikimedia.org/mailman/listinfo/wikimedia-cascadia 
 https://lists.wikimedia.org/mailman/listinfo/wikimedia-cascadia
 
  Pine
 
 
  On Tue, May 19, 2015 at 3:27 PM, Brion Vibber bvib...@wikimedia.org
 mailto:bvib...@wikimedia.org wrote:
 
  Welcome aboard!
 
  -- brion
 
  On Tuesday, May 19, 2015, Chris Steipp cste...@wikimedia.org wrote:
 
  Hi all,
 
  I'd like to introduce Darian Anthony Patrick, our new Application
  Security
  Engineer for the foundation! Darian joins me as a member of the newly
  formed Security Team. He comes from Aspect Security, where he provided
  code/architecture reviews and pen testing to large national and
  international financial institutions. Darian will be working remotely
  from
  Portland, OR. You can find him on irc as dapatrick. Darian will focus
 on
  maintaining and improving the security of MediaWiki and other software
 at
  the WMF.
 
  In his own words,
 
  “I’m super excited to join the organization, and I look forward to
  working with you all.”
 
  Welcome Darian!
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org mailto:Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l 
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org mailto:Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l 
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l




-- 
Gabriel Wicke
Principal Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Moritz Muehlenhoff joins as Ops Security Engineer

2015-04-02 Thread Gabriel Wicke
Welcome, Moritz! The German cabal is getting stronger again ;)

Gabriel

On Thu, Apr 2, 2015 at 10:50 AM, Tomasz Finc tf...@wikimedia.org wrote:

 Welcome Moritz, great to have you here

 On Thu, Apr 2, 2015 at 2:07 AM, Mark Bergsma m...@wikimedia.org wrote:
  Hi all,
 
  I'm very pleased to announce that as of stoday/syesterday, Moritz
  Mühlenhoff will be joining the Ops team in the role of Operations
  Security Engineer. We're excited as for the first time we'll have an
  engineer on our team able to focus on enhancing the security of our
  infrastructure.
 
  Some of you Debian users may recognize his name; in his spare time
  he's very active in the Debian Security Team and sends out a large
  portion of their security advisory mails. ;)
 
  Moritz lives in Bremen, North Germany (internationally perhaps best
  known for being the home of Beck's beer) with his spouse Silvia and
  their 16 m/o son Tjark. Besides being a Debian Developer, he also very
  much enjoys Rugby Union and plays tighthead prop in his local club
  Union 60 Bremen in the third divison of Germany. He used to be a
  frequent visitor of film festivals such as the San Sebastian festival,
  but with the baby around home theatre has become more prevalent. :-)
 
  Moritz is working with us remotely, and can usually be found using his
  nick jmm on Freenode.
 
  Please join me in welcoming Moritz to the team!
 
  --
  Mark Bergsma m...@wikimedia.org
  Lead Operations Architect
  Director of Technical Operations
  Wikimedia Foundation
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parsoid performance metrics

2015-03-31 Thread Gabriel Wicke
Christy,

thank you for making this happen! Having continuously-tracked metrics for
the key performance characteristics will help us keep tabs on regressions 
clearly highlight performance improvements as they happen. Both are great
motivators.

Thank you for your work  best of luck with your next projects!

Gabriel

On Tue, Mar 31, 2015 at 11:18 AM, Subramanya Sastry ssas...@wikimedia.org
wrote:


 Thanks Christy for your work on the project. Your work in instrumenting
 Parsoid and providing us with the dashboards is quite useful and will help
 us keep on top of perf regressions, and identifying things to improve.

 Subbu.


 On 03/31/2015 01:04 PM, E.C Okpo wrote:

 Hello,

 Parsoid now has dashboards that track performance metrics for both the
 html
 to wikitext (1) and wikitext to html (2) routes. Performance
 instrumentation was achieved with StatsD, Graphite and Grafana.

 I also compiled a guide (3) to this process for future reference, though
 your mileage might vary.

 These materials were created as part of my FOSS-OPW Internship with the
 Parsoid team, which ends today :(. It's been such a blast working with the
 Parsoid team, meeting members of the community and getting a taste of
 working on Open Source Software.

 Regards,
 Christy Okpo

 (1) http://grafana.wikimedia.org/#/dashboard/db/parsoid-timing-html2wt
 (2) http://grafana.wikimedia.org/#/dashboard/db/parsoid-timing-wt2html
 (3)
 https://www.mediawiki.org/w/index.php?title=Parsoid/
 Adding_instrumentation_how-to
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wikitech] VisualEditor on Wikipedia now faster with RESTBase

2015-03-25 Thread Gabriel Wicke
Jon,

On Wed, Mar 25, 2015 at 9:08 AM, Jon Robson jdlrob...@gmail.com wrote:

 Yes!!!
 This is really exciting and I'm keen to start exploring this on mobile web.
 Gabriel will you be running a session in Lyon around this? I'd be keen to
 explore mobile web using this where possible (at least in our experimental
 mode) and helping this api grow and mature further.


we'll definitely have a session on https://rest.wikimedia.org and RESTBase
in Lyon. If things go to plan we should also have at least a prototype of
the HTML section edit API ready. Will make sure to keep you in the loop on
that as we start work on it.

Am looking forward to your ideas and feedback!

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] VisualEditor on Wikipedia now faster with RESTBase

2015-03-19 Thread Gabriel Wicke
Hello all,


Earlier this morning, we made some good progress towards a faster
VisualEditor experience by loading the HTML from https://rest.wikimedia.org/,
the REST content API that entered beta production a bit over a week ago
[1]. Preliminary data shows a drop of mean client HTML load times by close
to 40% from about 1.9 seconds to 1.2 seconds.


The reasons for this speed-up are primarily



   -

   a reduction in HTML size by 30-40%, achieved by storing page metadata
   separately in RESTBase [2], and
   -

   storing (rather than caching) the HTML of all Wikipedia articles, thus
   eliminating expensive cache misses.


So far we have enabled this optimization on all Wikipedias. Other projects
with VisualEditor support will follow over the next week. There are also a
lot more optimizations in the pipeline. Eventually, we hope to completely
eliminate the need to re-load the page for editing by using the same
Parsoid-generated HTML for regular page views.


While many people helped to make RESTBase and the content API a reality
(see the original announcement [1]), I want to specially call out Marko
Obrovac for doing much of the integration work with MediaWiki and the
VisualEditor extension.


I hope that you enjoy the newly faster VisualEditor experience as much as
we do!


Sincerely --


Gabriel Wicke


Principal Software Engineer, Wikimedia Foundation


[1]: https://lists.wikimedia.org/pipermail/wikitech-l/2015-March/081135.html

[2]: https://www.mediawiki.org/wiki/RESTBase
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Wmfall] VisualEditor on Wikipedia now faster with RESTBase

2015-03-19 Thread Gabriel Wicke
On Thu, Mar 19, 2015 at 4:50 PM, Jared Zimmerman 
jared.zimmer...@wikimedia.org wrote:

 https://en.wikipedia.org/wiki/Barack_Obama?veaction=edit just loaded in 2
 seconds.



Much of this is also owed to *a lot* of optimization work in VisualEditor
over the last months. Plenty of ingenuity and hard work by the entire
VisualEditor team and Ori went into making this possible.

Cheers!

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Wikimedia REST content API is now available in beta

2015-03-12 Thread Gabriel Wicke
Hi Gerard,

On Thu, Mar 12, 2015 at 4:32 AM, Gerard Meijssen gerard.meijs...@gmail.com
wrote:

 Hoi,
 In what way will we know how useful this is? Will we have usage statistics
 ?


yes, we have metrics on request rates, status codes and response times by
end point. Here is a dashboard showing some of those metrics:

http://grafana.wikimedia.org/#/dashboard/db/restbase

The first users will be VisualEditor and other Parsoid clients. The
VisualEditor integration has been working out of the box on
https://test.wikipedia.org/, so the next step will be to switch
VisualEditor to use RESTBase on other phase0 wikis (notably mediawiki.org)
as well.

With all current revisions available from storage we are now also in a
position to offer HTML dumps again, which is tracked in
https://phabricator.wikimedia.org/T17017.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Wikimedia REST content API is now available in beta

2015-03-10 Thread Gabriel Wicke
Hello all,

I am happy to announce the beta release of the Wikimedia REST Content API
at

https://rest.wikimedia.org/

Each domain has its own API documentation, which is auto-generated from
Swagger API specs. For example, here is the link for the English Wikipedia:

https://rest.wikimedia.org/en.wikipedia.org/v1/?doc

At present, this API provides convenient and low-latency access to article
HTML, page metadata and content conversions between HTML and wikitext.
After extensive testing we are confident that these endpoints are ready for
production use, but have marked them as 'unstable' until we have also
validated this with production users. You can start writing applications
that depend on it now, if you aren't afraid of possible minor changes
before transitioning to 'stable' status. For the definition of the terms
‘stable’ and ‘unstable’ see https://www.mediawiki.org/wiki/API_versioning .

While general and not specific to VisualEditor, the selection of endpoints
reflects this release's focus on speeding up VisualEditor. By storing
private Parsoid round-trip information separately, we were able to reduce
the HTML size by about 40%. This in turn reduces network transfer and
processing times, which will make loading and saving with VisualEditor
faster. We are also switching from a cache to actual storage, which will
eliminate slow VisualEditor loads caused by cache misses. Other users of
Parsoid HTML like Flow, HTML dumps, the OCG PDF renderer or Content
translation will benefit similarly.

But, we are not done yet. In the medium term, we plan to further reduce the
HTML size by separating out all read-write metadata. This should allow us
to use Parsoid HTML with its semantic markup
https://www.mediawiki.org/wiki/Parsoid/MediaWiki_DOM_spec directly for
both views and editing without increasing the HTML size over the current
output. Combined with performance work in VisualEditor, this has the
potential to make switching to visual editing instantaneous and free of any
scrolling.

We are also investigating a sub-page-level edit API for micro-contributions
and very fast VisualEditor saves. HTML saves don't necessarily have to wait
for the page to re-render from wikitext, which means that we can
potentially make them faster than wikitext saves. For this to work we'll
need to minimize network transfer and processing time on both client and
server.

More generally, this API is intended to be the beginning of a multi-purpose
content API. Its implementation (RESTBase
http://www.mediawiki.org/wiki/RESTBase) is driven by a declarative
Swagger API specification, which helps to make it straightforward to extend
the API with new entry points. The same API spec is also used to
auto-generate the aforementioned sandbox environment, complete with handy
try it buttons. So, please give it a try and let us know what you think!

This API is currently unmetered; we recommend that users not perform more
than 200 requests per second and may implement limitations if necessary.

I also want to use this opportunity to thank all contributors who made this
possible:

- Marko Obrovac, Eric Evans, James Douglas and Hardik Juneja on the
Services team worked hard to build RESTBase, and to make it as extensible
and clean as it is now.

- Filippo Giunchedi, Alex Kosiaris, Andrew Otto, Faidon Liambotis, Rob
Halsell and Mark Bergsma helped to procure and set up the Cassandra storage
cluster backing this API.

- The Parsoid team with Subbu Sastry, Arlo Breault, C. Scott Ananian and
Marc Ordinas i Llopis is solving the extremely difficult task of converting
between wikitext and HTML, and built a new API that lets us retrieve and
pass in metadata separately.

- On the MediaWiki core team, Brad Jorsch quickly created a minimal
authorization API that will let us support private wikis, and Aaron Schulz,
Alex Monk and Ori Livneh built and extended the VirtualRestService that
lets VisualEditor and MediaWiki in general easily access external services.

We welcome your feedback here: https://www.mediawiki.org/wiki/Talk:RESTBase
- and in Phabricator
https://phabricator.wikimedia.org/maniphest/task/create/?projects=RESTBasetitle=Feedback:
.

Sincerely --

Gabriel Wicke

Principal Software Engineer, Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Announcing service-runner, a startup module / supervisor for node services

2015-02-24 Thread Gabriel Wicke
DJ,

On Tue, Feb 24, 2015 at 3:04 AM, Derk-Jan Hartman 
d.j.hartman+wmf...@gmail.com wrote:

 I haven't looked into feature sets and/or requirements at all, but has
 anyone looked into PM2 ?
 https://github.com/Unitech/pm2

 I know we use it internally at my company and that folks are reasonably
 happy with it (compared to the other stuff that is out there).



I looked at pm2 a week ago, and found it interesting (it's listed in the
'see also' section). It does offer clustering as well, so in that respect
it's closer to service-runner. It also has startup scripts for different
environments, similar to forever-service.

There are a few things I dislike about it:

- It tries to replace init for node services, and puts a lot of effort into
the interactive UI. I'm not convinced that this is warranted or useful. I
think services should normally be started and stopped just like any other
system service, without the need to learn a tool that's specific to nodejs.
It is also quite a bit larger than the 380 lines or so in service-runner,
with most of that code spent on interactive things we don't necessarily
want / need.

- Logging is using stdout and stderr, which are both blocking. This means
that a full disk can bring down a service (happened before with Parsoid).
We have since moved to structured JSON logging with logstash over UDP/gelf,
and are careful not to block on logging or metrics.

- There seems to be no metrics reporting apart from the interactive shell.
We want to systematically monitor services and encourage developers to
further instrument their services internally, so this is important to us.

- Its heap limiting feature simply sets v8's old space limit, which means
that processes will have high latency as they spend most of their time in
GC when approaching the limit. We avoid the latency penalty by periodically
monitoring v8's internal memory stats and gracefully restarting the worker
if the limit has been breached several times in a row (and complain loudly
about it).

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Announcing service-runner, a startup module / supervisor for node services

2015-02-24 Thread Gabriel Wicke
Giuseppe,

thanks for having a look.

Regarding 10 lines of JS: The node cluster module
http://nodejs.org/api/cluster.html is part of nodejs core and runs a bit
longer than that. It's actually a fairly elegant way to implement prefork
style servers with support for graceful restarts, sane signal handling etc
without requiring changes in the individual services. It is also not
specific to HTTP, but works with arbitrary socket servers.

On Tue, Feb 24, 2015 at 12:02 AM, Giuseppe Lavagetto 
glavage...@wikimedia.org wrote:

 So, is there a way to run a single worker without
 forking out? If not, I guess it would be easy to add this option ('run
 as a single worker on port XXX') to the service-runner.


Yes, you can either set num_workers to 0 in the config, or pass in -n 0 on
the commandline. This is especially useful for small installs and
development / profiling.

What you seem to be hinting at though is a preference for running each
worker on a different port  then using iptables or LVS to distribute
requests across the workers. This model can be supported with
service-runner as well (with -n 0 or 1), but would involve a lot more
moving parts and require solutions for coordinated graceful restarts. Which
compelling benefit do you see in going down that route?

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Html.php line 269

2015-02-18 Thread Gabriel Wicke
I would also recommend against actively trying to emit barely parsing
output. Any savings after compression should be rather small, and if only
end tags are omitted the DOM will of course still be the same size after
parsing.

In Parsoid we went to some modest lengths
https://github.com/wikimedia/parsoid/blob/master/lib/XMLSerializer.js to
produce polyglot markup http://www.w3.org/TR/html-polyglot/, which is
both valid XML and HTML5. This has enabled consumers to use either XML or
HTML5 parsers, which has proven very useful in practice. For example, this
makes it easier to consume this content using PHP's libxml. Doing the same
in MediaWiki core is admittedly harder, but I still think that we should
follow the robustness principle
https://en.wikipedia.org/wiki/Robustness_principle wherever we can.

Gabriel

On Wed, Feb 18, 2015 at 5:59 PM, Tim Starling tstarl...@wikimedia.org
wrote:

 On 19/02/15 08:43, Gergo Tisza wrote:
  On Wed, Feb 18, 2015 at 1:38 PM, Petr Bena benap...@gmail.com wrote:
 
  (Perhaps wgWellFormedXml is true by default?)
 
 
  It is: https://www.mediawiki.org/wiki/Manual:$wgWellFormedXml

 There was a Bugzilla report and Gerrit change requesting that it be
 set to false:

 https://phabricator.wikimedia.org/T52040
 https://gerrit.wikimedia.org/r/#/c/70036/

 I was against it, partly because of the omitted head tag.

 -- Tim Starling


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Flame graphs in Chrome / Chromium

2015-02-13 Thread Gabriel Wicke
PS: added these links to our growing collection of JS optimization tips at
https://www.mediawiki.org/wiki/Learning_JavaScript#Profiling.

On Fri, Feb 13, 2015 at 5:15 PM, Gabriel Wicke gwi...@wikimedia.org wrote:

 Thanks, Ori!

 Another great tool that we have been using heavily for JS profiling is
 https://github.com/jlfwong/chrome2calltree. It allows you to use the
 excellent KCachegrind profile viewer, which has call graphs, relative call
 frequency, grouping by file  other useful features. It works with profiles
 generated by V8, including Chrome.

 It also powers https://github.com/gwicke/nodegrind tool, which makes it
 really easy to profile node projects by just calling 'nodegrind script.js'
 instead of 'node script.js'.

 Gabriel

 On Fri, Feb 13, 2015 at 4:56 PM, Ori Livneh o...@wikimedia.org wrote:

 Hello,

 The timeline and flame graph features of Chrome's DevTools have been very
 useful for us as we work to understand and improve the performance of
 VisualEditor. Someone asked me today about how we use these tools, so I
 recorded a short (3-minute) screencast. It unfortunately cut off near the
 end, but only the last sentence or so got clipped.


 https://commons.wikimedia.org/wiki/File:Demonstration_of_Chromium%27s_timeline_feature.webm

 T88590 is a good example of a bug we caught using this feature:

 https://phabricator.wikimedia.org/T88590

 Hope it's useful,

 Ori

 ___
 Engineering mailing list
 engineer...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/engineering



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] Flame graphs in Chrome / Chromium

2015-02-13 Thread Gabriel Wicke
Thanks, Ori!

Another great tool that we have been using heavily for JS profiling is
https://github.com/jlfwong/chrome2calltree. It allows you to use the
excellent KCachegrind profile viewer, which has call graphs, relative call
frequency, grouping by file  other useful features. It works with profiles
generated by V8, including Chrome.

It also powers https://github.com/gwicke/nodegrind tool, which makes it
really easy to profile node projects by just calling 'nodegrind script.js'
instead of 'node script.js'.

Gabriel

On Fri, Feb 13, 2015 at 4:56 PM, Ori Livneh o...@wikimedia.org wrote:

 Hello,

 The timeline and flame graph features of Chrome's DevTools have been very
 useful for us as we work to understand and improve the performance of
 VisualEditor. Someone asked me today about how we use these tools, so I
 recorded a short (3-minute) screencast. It unfortunately cut off near the
 end, but only the last sentence or so got clipped.


 https://commons.wikimedia.org/wiki/File:Demonstration_of_Chromium%27s_timeline_feature.webm

 T88590 is a good example of a bug we caught using this feature:

 https://phabricator.wikimedia.org/T88590

 Hope it's useful,

 Ori

 ___
 Engineering mailing list
 engineer...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/engineering


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] GPL upgrading to version 3

2015-02-08 Thread Gabriel Wicke
Tyler,

On Sun, Feb 8, 2015 at 6:11 PM, Tyler Romeo tylerro...@gmail.com wrote:

 However, I will not assume good faith for every other software company out
 there that may take MediaWiki, modify it or improve it in some way, and
 then begin selling it as proprietary software. It's nice to think the world
 is an ideal place where everybody shares their source code, but
 unfortunately we are not living in the ideal, and in fact that is the
 entire reason the GPL was written in the first place: in response to
 companies acting in bad faith.


the GPL (any version) doesn't do anything for the most likely scenario of a
company offering their 'improved' version of MediaWiki as a service. To
actually have real leverage in this case, we'd need to use the AGPL.

However, the AGPL would make it even harder to split out code into
libraries shared with the wider open source community, as very few
third-party users would consider using AGPL-licensed libraries. Even the
consequences of using AGPL-licensed network services like RESTBase seem to
be less clear than I expected, which is why we are in the process of
relicensing the main server code to Apache 2 as well (modules are already
Apache licensed).

If we were an open core company hoping to sell commercial licenses on FUD
I'd advocate for AGPL. Since we aren't  are actually more interested in
collaborating with the outside world I think that Apache 2 makes more sense
than both GPL  AGPL. Re-licensing MediaWiki is not going to happen any
time soon as there are so many copyright holders, but we could try to
re-license library code where possible. I also think that we should
strongly consider using the Apache 2 license for new projects.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Investigating building an apps content service using RESTBase and Node.js

2015-02-04 Thread Gabriel Wicke
On Tue, Feb 3, 2015 at 11:33 PM, Erik Moeller e...@wikimedia.org wrote:

 I think you will generally find agreement that moving client-side
 transformations that only live in the app to server-side code that
 enables access by multiple consumers and caching is a good idea. If
 there are reasons not do to this, now'd be a good time to speak up.

 If not, then I think one thing to keep in mind is how to organize the
 transformation code in a manner that it doesn't just become a
 server-side hodgepodge still only useful to one consumer, to avoid
 some of the pitfalls Brian mentions. Say you want to reformat
 infoboxes on the mobile web, but not do all the other stuff the mobile
 app does. Can you just get that specific transformation? Are some
 transformations dependent on others?  Or say we want to make a change
 only for the output that gets fed into the PDF generator, but not for
 any other outputs. Can we do that?



Right now the plan is to start from plain Parsoid HTML. The mobile app
service would be called for each new revision to prime the cache / storage.
Chaining transformations might be possible, but right now it's not clear
that it would be worth the complexity. Currently AFAIK only OCG and mobile
apps have strong transformation needs, and there seems to be little overlap
in the way they transform the content. Mobile web still wraps sections into
divs, but we are looking into eliminating that by possibly integrating the
section markup into the regular Parsoid output.

Regarding general-purpose APIs vs. mobile: I think mobile is in some ways a
special case as their content transformation needs are closely coupled with
the way the apps are presenting the content. Additionally, at least until
SPDY is deployed there is a strong performance incentive to bundle
information in a single response tailored to the app's needs. One strategy
employed by Netflix is to introduce a second API layer
http://techblog.netflix.com/2012/07/embracing-differences-inside-netflix.html
on
top of the general content API to handle device-specific needs. I think
this is a sound strategy, as it contains the volatility in a separate layer
while ensuring that everything is ultimately consuming the general-purpose
API. If the need for app-specific massaging disappears over time, we can
simply shut down the custom service / API end point without affecting the
general API.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] C2.com switches to single-page app distributed nodejs backend

2015-02-02 Thread Gabriel Wicke
The original wiki is getting a technical facelift:

   - http://c2.com/cgi/wiki?WikiWikiSystemNotice
   - http://c2.fed.wiki.org/view/welcome-visitors
   - https://news.ycombinator.com/item?id=8983158

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] From Node.js to Go

2015-01-29 Thread Gabriel Wicke
I'm personally more excited about Rust. It is a true systems language with
a modern type system, does away with the GC for more predictable
performance and generally outperforms Go on CPU-bound tasks. It could
actually become an interesting option for a highly parallel Parsoid 2.0
version once its 1.0 is out of the door. The Mozilla folks have built solid
PEG and DOM libraries, which are important for that task.

In any case, I see little benefit in porting existing Node code to Go right
now. The performance gains are marginal, and the cost of language
fragmentation is real. We have a large number of JS developers, and it
looks like the web is not going to move away from JS any time soon. Modern
JS with promises and generators is also quite a bit nicer than old-style
callbacks, and we are seeing speed-ups of around 10% with io.js.

Gabriel

PS: Wikia are building an auth service in Go
https://github.com/Wikia/helios/, but have otherwise standardized on Java
for now.

On Thu, Jan 29, 2015 at 7:21 PM, Ori Livneh o...@wikimedia.org wrote:

 (Sorry, this was meant for wikitech-l.)

 On Thu, Jan 29, 2015 at 7:20 PM, Ori Livneh o...@wikimedia.org wrote:

  We should do the same, IMO.
  http://bowery.io/posts/Nodejs-to-Golang-Bowery/
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Request for comments for RESTBase?

2015-01-28 Thread Gabriel Wicke
MZMcBride,

the two RFCs that originally discussed RESTBase are:

- https://www.mediawiki.org/wiki/Requests_for_comment/Storage_service
- https://www.mediawiki.org/wiki/Requests_for_comment/Content_API

They were both originally discussed at the arch summit in January 2014.

We have since had two more RFC meetings on the subject, most recently
November 20th, 2014.

Regarding search, the implementation started out as two separate services
(Rashomon and RESTFace), which then morphed into RESTBase as we learned
about the use cases and access patterns.

Hope that helps,

Gabriel

On Wed, Jan 28, 2015 at 7:11 PM, MZMcBride z...@mzmcbride.com wrote:

 Hi.

 There's been quite a bit of discussion about RESTBase lately. Is there a
 request for comments on mediawiki.org about RESTBase? I looked at
 https://www.mediawiki.org/wiki/Requests_for_comment and didn't see one.

 From my limited understanding of what's being proposed, I'd personally be
 a lot more comfortable with the idea if someone from both the software
 architecture side (Brion, Tim, or equivalent) and someone from the
 operations side (Mark, Faidon, Giuseppe, or equivalent) weighed in and
 signed off on what's being proposed. This may have already happened
 somewhere, but I didn't see anything in my brief searching and poking
 around on pages such as https://www.mediawiki.org/wiki/RESTBase.

 MZMcBride



 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Thoughts: stateless services with open servers?

2015-01-28 Thread Gabriel Wicke
On Tue, Jan 27, 2015 at 11:46 AM, Brion Vibber bvib...@wikimedia.org
wrote:

 Another possibility is to shell out to nodejs-based services as an
 alternative to running them as ongoing web services.


I have a hard time imagining a situation where you can install node and
everything else, but would not just apt-get install parsoid
https://www.mediawiki.org/wiki/Parsoid/Setup or mathoid. VMs that can run
MediaWiki with all bells  whistles start at around $3 per month
http://www.ovh.com/us/vps/vps-classic.xml these days, and are likely to
become even cheaper.

I believe we can make installing a fully-featured MediaWiki service system
as simple as copypasting 2-3 lines to a shell, or even executing a remote
installer script that runs those lines for you. Additionally, we can offer
VM images derived from the same install process through Bitnami or others.

To make this happen, we need to evaluate the options, make a decision and
then follow through by making this our recommended and supported
installation mechanism  providing a solid upgrade path for existing
content.

I created https://phabricator.wikimedia.org/T87774 as a high-level tracking
bug for the task of evaluating and deciding on a distribution strategy
targeted at VMs.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Thoughts: stateless services with open servers?

2015-01-28 Thread Gabriel Wicke
Brad,

On Wed, Jan 28, 2015 at 10:08 AM, Brad Jorsch (Anomie) 
bjor...@wikimedia.org wrote:

 If you're not on Debian or Ubuntu. Although yum install parsiod or
 whatever might work on other Linux distros, what if you're on Windows or
 something more exotic?


I think that we can help most users more if we identify the most important
use cases and focus on making those *really* easy. The installation in
exotic scenarios should still be supported, but I think it's okay if it
involves following some instructions manually.

My gut feel is that most users are in the situation of having a free choice
of hosting provider or distribution. If we can't find a good solution that
satisfies our requirements *and* works well across distributions, then I
think focusing on Debian / Ubuntu should be okay.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Thoughts: stateless services with open servers?

2015-01-28 Thread Gabriel Wicke
Scott,

On Wed, Jan 28, 2015 at 9:55 AM, Scott MacLeod helia...@gmail.com wrote:

 Gabriel and Wikimedia developers,

 In what ways might you be anticipating developments with SemanticMediaWiki,
 and in what ways not?



I definitely think that the mechanism needs to support the (optional)
installation of SMW or other extensions. Just called this out in
https://phabricator.wikimedia.org/T87774.

I'm not 100% sure if that's what you meant though, so please elaborate if
you had something else in mind.

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Brion's role change within WMF

2015-01-20 Thread Gabriel Wicke
Brion,

I'm glad to have you back on the dark server side!

This is great news for our APIs and the technical evolution of our stack in
general.

Cheers!

Gabriel

On Tue, Jan 20, 2015 at 9:12 AM, Daniel Kinzler dan...@brightbyte.de
wrote:

 Am 20.01.2015 um 17:34 schrieb Brion Vibber:
  Quick update:
 
  I've had a great experience working on our mobile apps, but it's time to
  get back to core MediaWiki and help clean my own house... now that
 we've
  got Mobile Apps fully staffed I'm leaving the mobile department and will
 be
  reporting directly to Damon within WMF Engineering.

 Hey Brion!

 To me, as a member of the Architecture Committee and someone often
 frustrated
 with the encrusted innards of MediaWiki, this is great news!

 Or as you might say: Awesome!

 Cheers,
 Daniel


 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] S Page debuts as Technical Writer

2015-01-05 Thread Gabriel Wicke
Congratulations, S! I'm looking forward to working with you in your new
role.

Gabriel

On Mon, Jan 5, 2015 at 2:26 PM, Ryan Kaldari rkald...@wikimedia.org wrote:

 Yay! This is great news! I can't think of anyone that would be better for
 this job.

 On Mon, Jan 5, 2015 at 2:23 PM, Rachel Farrand rfarr...@wikimedia.org
 wrote:

  So excited to have you on the team, S!
  Looking forward to skiing related ECT team offsites too. ;)
  Rachel
 
  On Mon, Jan 5, 2015 at 2:06 PM, Quim Gil q...@wikimedia.org wrote:
 
   It is an honor to announce that S Page[1] has moved from the
  Collaboration
   (Flow) team to join the WMF Engineering Community team as a Technical
   Writer[2]. We were really lucky to find such a great combination of
  English
   communication skills, awareness of MediaWiki documentation pain points,
   more-than-basic MediaWiki development experience, and Wikimedia
 community
   mileage. Besides, S is self-driven, based in San Francisco, and
  accompanies
   almost any reaction with a smile, assets of great value in his new
   position.
  
   Or in his own words: S Page, the old guy in the S.F. office in Jhane
   Barnes shirts[3], feels compelled to write things down, now he'll be
  doing
   it officially as he transfers from the Collaboration (Flow) team to the
   Tech Writer position in the Engineering Community Team. He wrote
   documentation for Sun's window systems and the innovative PenPoint
   operating system and has a serious crush on software developers. When
 not
   lowercasing the Invasion of Officious Pride Capitals, he lives to ski
 and
   snowboard.
  
   S will be responsible for leading the creation and maintenance of
   documentation for third-party application developers and free software
   contributors to Wikimedia projects. The three axes that define his
  initial
   technical writing space are
  
   1. Define the plan for a technical documentation hub[4]
  
   2. Fix the documentation of the Architecture RfC process[5]
  
   3. Work on T2001 (was Bug 1) blocking tasks[6]
  
   While we expect S to write lots of docs, we actually hope that he
 shines
  in
   his role coordinating technical documentation activities involving
   developers and tech-curious editors with different affiliations and
  levels
   of expertise. You’re welcome to make proposals and identify specific
  tasks
   at mw:Talk:Documentation[7], or simply add the #Documentation tag to
 any
   task in Phabricator that affects the documentation.[8]
  
   As if all this would not be enough, S Page's first official assignment
 is
   actually one that he carries from his previous position that fits
  perfectly
   in the Engineering Community goals: deliver a Trello to Phabricator
   migration script.[9]
  
   Welcome S to your new role and to the Engineering Community team!
  
  
   == References ==
  
   [1] https://meta.wikimedia.org/wiki/User:SPage_(WMF)
   [2] https://www.mediawiki.org/wiki/Engineering_Community_Team
   [3] https://www.google.com/search?q=Jhane+Barnes+shirttbm=isch
   [4] https://www.mediawiki.org/wiki/dev.wikimedia.org
   [5] https://phabricator.wikimedia.org/T1107 and more
   [6] https://phabricator.wikimedia.org/T2001
   [7] https://www.mediawiki.org/wiki/Talk:Documentation
   [8] https://phabricator.wikimedia.org/tag/documentation/
   [9] https://phabricator.wikimedia.org/T821
  
   --
   Quim Gil
   Engineering Community Manager @ Wikimedia Foundation
   http://www.mediawiki.org/wiki/User:Qgil
   ___
   Wikitech-l mailing list
   Wikitech-l@lists.wikimedia.org
   https://lists.wikimedia.org/mailman/listinfo/wikitech-l
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Attracting new talent to our projects

2014-12-31 Thread Gabriel Wicke
Perhaps some fun HTTP headers
http://royal.pingdom.com/2012/08/15/fun-and-unusual-http-response-headers/
?

Gabriel

On Wed, Dec 31, 2014 at 4:19 PM, Jon Robson jrob...@wikimedia.org wrote:

 Recently on https://developer.mozilla.org/ I noticed an easter egg
 when you pop open the JavaScript console.

 snip
 Interested in having a direct impact on hundreds of millions of users?
 Join
 Mozilla, and become part of a global community that’s helping to build a
 brighter future for the Web.
 snip

 I'm curious how successful this is but I wonder if we did the same
 whether we might see some new contributors popping up?

 Why don't we have a similar message linking to our mailing list /
 phabricator instance / mediawiki.org homepage?

 Thoughts?

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Fwd: find previous section title

2014-12-29 Thread Gabriel Wicke
Moritz,

you can certainly do this in HTML, either using the PHP parser output or
Parsoid. Parsoid output makes it easier to identify math extension output.
If you need the wikitext for the heading, then Parsoid can also give you
the source offsets of the that in data-parsoid (see the dsr property in
there, it encodes startOffset, endOffset, startTagWidth, endTagWidth).

Gabriel
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Revision metadata as a service?

2014-11-05 Thread Gabriel Wicke
Erik,

On 11/05/2014 10:07 AM, Erik Moeller wrote:
 I'm wondering if a lightweight service that satisfies the following
 requirements might be a good idea:
 
 - community-created schemas (similar to the EventLogging schemas on meta)
 - basic per-user authentication/authorization
 - basic namespacing (e.g. WikiProject Medicine:Quality refers to a
 specific schema + specific permissions)


the need for storing different formats and metadata per revision was
actually one of the motivations for creating RESTBase [1]. Currently it is
set up to store html, wikitext, data-parsoid and data-mw per revision, with
each property being stored in its own bucket behind the scenes. It is
possible to add new revisioned buckets for new types of content with a
simple PUT, and the plan is to have separate ACLs per bucket.

What are the indexing requirements for this metadata? If fast access by
specific properties is needed, then using tables would make more sense, as
we'll then be able to leverage secondary indexing. Tables have the same
properties as buckets, and can also be created with a PUT of the schema.
Query results are returned as JSON.

A limitation for queries in RESTBase is that they are limited to indexes
defined in the schema. If ad-hoc queries on arbitrary combinations of
attributes are needed, then ElasticSearch would be more suitable.

 If such a service existed, community members, researchers and
 occasionally WMF itself could create their own tools/gadgets that use
 this service, perhaps with a lightweight global approval process.
 
 If this seems like a good idea, I'd be curious about implementation
 strategies -- are we blocked on something like SOA Auth [1] to
 implement this as a standalone service? My sense is that you'd want to
 pull this out of MediaWiki for maximum flexibility and simplicity.

It might be possible to improvise a bit, but we'll need basic SOA auth
fairly soon for other use cases too. I'm optimistic that we can start small
though, especially if this doesn't need to tie into browser-based SUL
straight away.

Gabriel

[1]: https://github.com/gwicke/restbase

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Introducing Math rendering 2.0

2014-10-23 Thread Gabriel Wicke
Dear Wikipedians,

We'd like to announce a major update of the Math (rendering) extension.

For registered Wikipedia users, we have introduced a new math rendering
mode using MathML, a markup language for mathematical formulae. Since MathML
is not supported in all browsers [1], we have also added a fall-back mode
using scalable vector graphics (SVG).

Both modes offer crisp rendering at any resolution, which is a major
advantage over the current image-based default. We'll also be able to make
our math more accessible by improving screenreader and magnification support.

We encourage you to enable the MathML mode in your Appearance preferences.
As an example, the URL for this section on the English Wikipedia is:

  https://en.wikipedia.org/wiki/Special:Preferences#mw-prefsection-rendering

For editors, there are also two new optional features:

1) You can set the id attribute to create math tags that can be
referenced. For example, the following math tag

math id=MassEnergyEquivalence
E=mc^2
/math

can be referenced by the wikitext

[[#MassEnergyEquivalence|mass energy equivalence]]

This is true regardless of the rendering mode used.

2) In addition, there is the attribute display with the possible values
block or inline. This attribute can be used to control the layout of the
math tag with regard to centering and size of the operators. See
https://www.mediawiki.org/wiki/Extension:Math/Displaystyle
for a full description, of this feature.

Your feedback is very welcome. Please report bugs in Bugzilla against the
Math extension, or post on the talk page here:
https://www.mediawiki.org/wiki/Extension_talk:Math

All this is brought to you by Moritz Schubotz and Frédéric Wang (both
volunteers) in collaboration with Gabriel Wicke, C. Scott Ananian,
Alexandros Kosiaris and Roan Kattouw from the Wikimedia Foundation. We also
owe a big thanks to Peter Krautzberger and Davide P. Cervone of MathJax for
the server-side math rendering backend.

Best,

Gabriel Wicke (GWicke) and Moritz Schubotz (Physikerwelt)


[1]: Currently MathML is supported by Firefox  other Gecko-based browsers,
and accessibility tools like Apple's VoiceOver. There is also partial
support in WebKit.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Wrapping signatures with a span for discoverability

2014-10-03 Thread Gabriel Wicke
On 09/30/2014 05:23 PM, Erik Bernhardson wrote:
 This was written[1] for Echo a couple years ago at the beginning of the
 project.  This particular implementation is far from perfect, but here are
 a couple of the complexities involved:
 
 * a single regexp doesn't currently match timestamps in different
 languages, so a timestamp regex is generated based on the $wgContLang
 timestamp output.
 * wiki's each control their own signature[2]. Changing the signature
 exposed a bug[3] in Echo which caused it to stop sending mention
 notifications.
 * The fix[4] for above basically switches the code around to extract
 wikilinks from the wikitext and run their content through Title to
 determine if a link is to NS_USER, NS_USER_TALK, or the Contributions page
 of NS_SPECIAL, all of which appear in signatures.
 
 From the standpoint of programmatically detecting a signature, the above
 could be cleaned up and work well enough.

I agree. If we can solve this with a reasonable amount of tech and elbow
grease in a way that works for old discussions too, then I think we should
do so.

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Engineering] RfC Discussions Today/Next week

2014-09-03 Thread Gabriel Wicke
On 09/03/2014 11:08 AM, Rachel Farrand wrote:
 
 Hello, 
 If you are interested in joining today's RfC discussion,
 the Architecture Committee will be discussing the following RfCs:  
 
 * SOA Authentication (Chris Steipp)
 https://www.mediawiki.org/wiki/Requests_for_comment/SOA_Authentication

The log for this part is now posted at:

https://www.mediawiki.org/wiki/Talk:Requests_for_comment/SOA_Authentication#RFC_meeting_2014-09-03

Thanks for the discussion!

Gabriel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

  1   2   3   >