Re: [Wikitech-l] Parsoid still doesn't love me

2015-11-09 Thread Brad Jorsch (Anomie)
On Fri, Nov 6, 2015 at 3:29 PM, Ricordisamoa 
wrote:

> What if I need to get all revisions (~2000) of a page in Parsoid HTML5?
> The prop=revisions API (in batches of 50) with mwparserfromhell is much
> quicker.
>

That's a tradeoff you get with a highly-cacheable REST API: you generally
have to fetch each 'thing' individually rather than being able to batch
queries.

If you already know how to individually address each 'thing' (e.g. you
fetch the list of revisions for the page first) and the REST API's ToS
allow it, multiplexing requests might be possible to reduce the impact of
the limitation. If you have to rely on "next" and "previous" links in the
content to address adjacent 'things' hateoas-style you're probably out of
luck.


-- 
Brad Jorsch (Anomie)
Senior Software Engineer
Wikimedia Foundation
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Summit proposal: Turning the Table of Contents into a discrete object

2015-11-09 Thread Isarra Yos
Hi! I would like to turn the mw ToC into a discrete object within the 
codebase. Write a ToC class and pull all the random building parts out 
of the parser and five levels of pageoutput, and make it stop messing up 
the page caching and stuff. Make this class a thing, separate from the 
content itself, that can appear on the page or be toggled or messed with 
or added to or moved or whatever by extensions.


I have a proposal about this for the developers summit which is about as 
specific: https://phabricator.wikimedia.org/T114057


Please come discuss. Would this affect what you're doing in a good or 
bad way? What do we know of that this should support at present? What 
would we, as developers or whatever the buckets, want out of it?


Also is this the sort of thing you normally use an RfC for? I'm a 
designer so I'm just asking questions and soliciting stories and all 
that before I go trying to do designy stuff on the mw backend, but maybe 
that's not really the way to do this here.


-I

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] people.wikimedia.org moved, access for all shell users

2015-11-09 Thread Daniel Zahn
Hi,

crossposting from the operations list, so that all shell users see it.

this is to let you know that the service
https://people.wikimedia.org
has moved to a new backend server. From terbium to
"rutherfordium.eqiad.wmnet", which is a ganeti VM.

Also, all shelll users have access now. We don't limit it to deployers
anymore and this service is now completely separate from any mediawiki
maintenance work that is happening on terbium, and terbium can be upgraded
to hhvm.

If you are an existing user:

Please just switch from using terbium.eqiad.wmnet to the new backend
rutherfordium.eqiad.wmnet.

All files have been copied with rsync from terbium, you should not have to
copy anything manually, and all URLs should still work.

You just have to connect to the new place to update files. Both home
directories are backed up in Bacula.

If you did not have access to this before, but have any kind of shell
access:

Now you also have access to people.wikimedia.org and can have an URL like
https://people.wikimedia.org/~youruser. (as opposed to just deployers
having this feature in the past).

To upload files copy them (with scp) to rutherfordium.eqiad.wmnet into a
directory called "public_html".

If that doesn't exist yet, simply create it with "mkdir public_html".
Files in that directory will be publicly accessible as
https://people.wikimedia.org/~youruser/yourfile.

I also added a message to SAL and will update
https://wikitech.wikimedia.org/wiki/People.wikimedia.org right now.

-- 
Daniel Zahn 
Operations Engineer
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parsoid still doesn't love me

2015-11-09 Thread Subramanya Sastry

On 11/09/2015 12:37 PM, Petr Bena wrote:

Do you really want to say that reading from disk is faster than
processing the text using CPU? I don't know how complex syntax of mw
actually is, but C++ compilers are probably much faster than parsoid,
if that's true. And these are very slow.

What takes so much CPU time in turning wikitext into html? Sounds like
JS wasn't a best choice here.


The problem is not turning wikitext into HTML, but turning it into HTML 
so that it can be turned back into wikitext when it is edited and doing 
it in such a way that you don't introduce dirty diffs.


That requires keeping around state, tracking things in wikitext closely, 
and doing a lot more analysis.


That means detecting markup errors, and retaining error recovery 
information so that you can account for it during analysis, and also so 
you can reintroduce the markup errors when you convert the html back to 
wikitext. This is the reason why we proposed 
https://phabricator.wikimedia.org/T48705 since we already have all the 
information about broken wikitext usage.


If you are interested in more details, either show up on 
#mediawiki-parsoid, or look at this april 2014 tech-talk: A preliminary 
look at Parsoid internals [ Slides 
, 
Video  ]. It has some details.


So, TL:DR; is Parsoid is a *bi-directional* wikitext <-> html bridge and 
doing that is non-trivial.


Subbu.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parsoid still doesn't love me

2015-11-09 Thread C. Scott Ananian
On Mon, Nov 9, 2015 at 1:37 PM, Petr Bena  wrote:

> Do you really want to say that reading from disk is faster than
> processing the text using CPU? I don't know how complex syntax of mw
> actually is, but C++ compilers are probably much faster than parsoid,
> if that's true. And these are very slow.
>
> What takes so much CPU time in turning wikitext into html? Sounds like
> JS wasn't a best choice here.
>

More fundamentally, the parsing task involves recursive expansion of
templates and image information queries, and popular wikipedia articles can
involve hundreds of templates and image queries.  Caching the result of
parsing allows us to avoid repeating these nested queries, which are a
major contributor to parse time.

One of the benefits of the Parsoid DOM representation[*] is that it will
allow in-place update of templates and image information, so that updating
pages after a change can be by simple substitution and *without* repeating
the actual "parse wikitext" step.
  --scott
[*] This actually requires some tweaks to the wikitext of some popular
templates; https://phabricator.wikimedia.org/T114445 is a decent summary of
the work (although be sure to read to the end of the comments, there's
significant stuff there which I haven't editing into the top-level task
description yet).

-- 
(http://cscott.net)
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parsoid still doesn't love me

2015-11-09 Thread Ricordisamoa

Il 09/11/2015 15:52, Brad Jorsch (Anomie) ha scritto:

On Fri, Nov 6, 2015 at 3:29 PM, Ricordisamoa 
wrote:


What if I need to get all revisions (~2000) of a page in Parsoid HTML5?
The prop=revisions API (in batches of 50) with mwparserfromhell is much
quicker.


That's a tradeoff you get with a highly-cacheable REST API: you generally
have to fetch each 'thing' individually rather than being able to batch
queries.

If you already know how to individually address each 'thing' (e.g. you
fetch the list of revisions for the page first) and the REST API's ToS
allow it, multiplexing requests might be possible to reduce the impact of
the limitation. If you have to rely on "next" and "previous" links in the
content to address adjacent 'things' hateoas-style you're probably out of
luck.




All of this seems overkill in the first place...

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parsoid still doesn't love me

2015-11-09 Thread Petr Bena
Do you really want to say that reading from disk is faster than
processing the text using CPU? I don't know how complex syntax of mw
actually is, but C++ compilers are probably much faster than parsoid,
if that's true. And these are very slow.

What takes so much CPU time in turning wikitext into html? Sounds like
JS wasn't a best choice here.

On Fri, Nov 6, 2015 at 11:37 PM, Gabriel Wicke  wrote:
> We don't currently store the full history of each page in RESTBase, so your
> first access will trigger an on-demand parse of older revisions not yet in
> storage, which is relatively slow. Repeat accesses will load those
> revisions from disk (SSD), which will be a lot faster.
>
> With a majority of clients now supporting HTTP2 / SPDY, use cases that
> benefit from manual batching are becoming relatively rare. For a use case
> like revision retrieval, HTTP2 with a decent amount of parallelism should
> be plenty fast.
>
> Gabriel
>
> On Fri, Nov 6, 2015 at 2:24 PM, C. Scott Ananian 
> wrote:
>
>> I think your subject line should have been "RESTBase doesn't love me"?
>>  --scott
>>
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>
>
>
> --
> Gabriel Wicke
> Principal Engineer, Wikimedia Foundation
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Forking, branching, merging, and drafts on Wikipedia

2015-11-09 Thread C. Scott Ananian
Apologies for the summit proposal reading like a manifesto.  Drafts
are a big use case, as is offline editing.  Flagged revisions might
use this as well. As a feature request it dates back to the dark days
of the wiki.  It certainly is an enabler for a lot of different
editing/revision/collaboration models that people have proposed over
the years.
 --scott

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Forking, branching, merging, and drafts on Wikipedia

2015-11-09 Thread Isarra Yos

On 07/11/15 00:32, David Gerard wrote:

On 7 November 2015 at 00:29, Brian Wolff  wrote:


I feel like different people want different things, and what is really
needed is a user-centric discussion of use-cases to drive a feature
wishlist, not any sort of discussion about implementation.

Yes. I see the rationaie in that Phabricator ticket, but it reads like
personal ideology without reference to the Wikimedia projects. What is
the use case?


Yeah, we need to figure out who all these different groups are, too. Who 
are doing similar things currently, and who would have use? Who already 
knows they want to do these things, and what can we ask them?


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Summit proposal: Turning the Table of Contents into a discrete object

2015-11-09 Thread Pine W
I lean in favor of this concept. Can someone from Performance comment in
Phabricator?

Thanks!
Pine
On Nov 9, 2015 7:26 PM, "Isarra Yos"  wrote:

> Hi! I would like to turn the mw ToC into a discrete object within the
> codebase. Write a ToC class and pull all the random building parts out of
> the parser and five levels of pageoutput, and make it stop messing up the
> page caching and stuff. Make this class a thing, separate from the content
> itself, that can appear on the page or be toggled or messed with or added
> to or moved or whatever by extensions.
>
> I have a proposal about this for the developers summit which is about as
> specific: https://phabricator.wikimedia.org/T114057
>
> Please come discuss. Would this affect what you're doing in a good or bad
> way? What do we know of that this should support at present? What would we,
> as developers or whatever the buckets, want out of it?
>
> Also is this the sort of thing you normally use an RfC for? I'm a designer
> so I'm just asking questions and soliciting stories and all that before I
> go trying to do designy stuff on the mw backend, but maybe that's not
> really the way to do this here.
>
> -I
>
> ___
> Wikitech-l mailing list
> Wikitech-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l