Re: [Wikitech-l] Timelines for Parsoid/PHP to replace legacy PHP parser

Lord_Farin Mon, 13 Jan 2020 11:54:29 -0800

Thank you so much for the very extensive reply! That's all I hoped to get,
and so much more. Much appreciated!


On Sun, 12 Jan 2020 at 15:37, Subramanya Sastry <[email protected]>
wrote:

> On 1/12/20 5:33 PM, Lord_Farin wrote:
>
> > Hi Wikitech,
> >
> > I've been catching up on the recent achievements regarding Parsoid/PHP,
> > well done!
>
> Thanks!
>
> The switchover of wikitext engines is going to take some time. I would
> be surprised if we got all the ducks lined up before 18 months from now
> -- we have a bunch of work to do still.
>
> Other details below:
>
> > With WMF sites being migrated, of course non-WMF sites start to creep
> into
> > the picture. As I'm involved in running one of those, I'm curious to know
> > if and how you are going to support this upgrade? I've read about Linter
> > and ParserMigration but I'm not clear on how they fit into the picture.
>
> We built the Linter and ParserMigration extensions to support the
> replacement of HTML4 Tidy with RemexHTML [1]. We anticipate leveraging
> those in our efforts to consolidate behind Parsoid (post-unification
> work) as the default wikitext engine for MediaWiki. We don't quite know
> the specifics yet. My hunch is that this replacement is going to be most
> complex for Wikimedia wikis and expect most 3rd party wikis to have a
> much easier time switching over.
>
> > I'm asking specifically because we are running some custom extensions
> which
> > will probably break with the advent of Parsoid/PHP.
> > At present we are running MW 1.33 on PHP 7.0, but we are not using VE.
> > It would be fine if we as a maintenance team have to invest some (or even
> > considerable) time and effort but I would like to know the size of the
> > endeavour beforehand...
>
> One of the changes that will take some work is how extensions interact
> with the parser (Parsoid in the future). So far, this happens through
> access to the Parser object as well as through parser hooks. However, in
> the Parsoid regime, this model will change. While the details are yet to
> be finalized and we are yet to publish the first draft for review
> (likely in the next couple months), here is how we've been thinking
> about this:
>
> 1. Extensions will no longer have direct access to the parser itself --
> all interaction will be through an API / interface.
>
> 2. Hooks are unlikely to be based on timelines of how wikitext passes
> through the parser, i.e. before something happens, or after something
> happens. We are going to move more towards a pure functional model as
> far as possible. So, as far as extension tags are concerned, they get
> access to the tag source, args, and possibly some other information and
> are expected to return output HTML / a DOM fragment (here they will
> leverage the parser API/interface I mention in 1. above). Most
> extensions that implement custom tags already behave in this manner and
> this simply formalizes that.
>
> 3. Some extensions set parser state and update it across invocations. We
> currently have no intention of supporting that. We are going to look at
> what the underlying need is that is being modeled through side-effects /
> state and will to provide first-class support for that in some manner.
> For example, some (like Cite) use state for enumeration and numbering
> purposes, and this can be done as a post-processing pass on the DOM when
> they get to inspect the "final" DOM. Presumably these global document
> processors are the exception, not the norm. But, statelessness lets us
> process the document in arbitrary order (or even skip processing parts
> of the document by reusing extension/template/media output from previous
> versions of the document), and use the final post-processing step as the
> synchronization step to enforce source-text ordering (like numbering).
>
> We anticipate most extensions are going to need some (hopefully minor)
> changes. If your extension doesn't deal with wikitext itself, the
> changes are probably going to be relatively minor. But, if your
> extension deals with wikitext, then it might need an update in terms of
> how it generates its output (using the ParsoidExtensionAPI interface
> instead of an actual parser object), but once again, this is unlikely to
> be very significant changes. However, if your extension maintains state
> across invocations, then it might need some rethink (as stated in 3.
> above).
>
> As our extension currently does some hacks to get a correct behaviour when
modifying the ToC, I guess that the new model will be a fair bit easier. As
stated, it will no longer be necessary to work with wikitext and instead a
hook at postprocessing level should be sufficient for that. This makes all
the templating also much easier.

In summary I'm looking forward to the exact implementation and while I
expect considerable work, it will be no way near the complexity that has
currently been built; just work migrating from A to B.


> If you want to get a really early look, you can poke around the Parsoid
> repo and its reimplementation of a few extensions [2]. But, note that we
> still have some work to do to (a) clean up the interfaces, (b) untangle
> them further from Parsoid's internals and (c) make sure our design is
> consistent with Tim's proposed work around hooks in general [3] [4]. So,
> what you see in the Parsoid repo today may not be what it will look like
> in the end (in terms of exact interfaces - names, methods, signatures),
> but they will nevertheless operate within the constraints / principles
> 1-3 above.
>
> As a long-term goal, we are trying to nudge wikitext (including
> templates, extensions) towards one where the final output is a
> composition of largely independent fragments (no matter who/what
> generated those fragments) with some mostly minor post-processing after
> the document is composed. An updated extension and parser hooks API
> during the switch to Parsoid is one of the first steps. Balanced / Typed
> templates will be the next step in that direction. [5].
>
> Hope this helps in planning early. Thanks for asking - it nudged me to
> outline our thinking early even before we have the publishable first
> draft of our updated extension model.
>
> Subbu ( on behalf of the Parsing Team ).
>
> [1] https://blog.wikimedia.org/2018/07/09/tidy-html5-replacement/
>
> [2] https://github.com/wikimedia/parsoid/tree/master/src/Ext
>
> [3]
> https://lists.wikimedia.org/pipermail/wikitech-l/2019-December/092867.html
>
> [4] https://phabricator.wikimedia.org/T240307
>
> [5] https://phabricator.wikimedia.org/T114445
>
>
To read some more context (some of which I had been following before) and
seeing it come together into a coherent plan of action is really cool.

Best of luck with all the complexity still ahead, and props for what you
all have achieved so far!

Best,
LF
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Timelines for Parsoid/PHP to replace legacy PHP parser

Reply via email to