Thank you so much for the very extensive reply! That's all I hoped to get, and so much more. Much appreciated!
On Sun, 12 Jan 2020 at 15:37, Subramanya Sastry <[email protected]> wrote: > On 1/12/20 5:33 PM, Lord_Farin wrote: > > > Hi Wikitech, > > > > I've been catching up on the recent achievements regarding Parsoid/PHP, > > well done! > > Thanks! > > The switchover of wikitext engines is going to take some time. I would > be surprised if we got all the ducks lined up before 18 months from now > -- we have a bunch of work to do still. > > Other details below: > > > With WMF sites being migrated, of course non-WMF sites start to creep > into > > the picture. As I'm involved in running one of those, I'm curious to know > > if and how you are going to support this upgrade? I've read about Linter > > and ParserMigration but I'm not clear on how they fit into the picture. > > We built the Linter and ParserMigration extensions to support the > replacement of HTML4 Tidy with RemexHTML [1]. We anticipate leveraging > those in our efforts to consolidate behind Parsoid (post-unification > work) as the default wikitext engine for MediaWiki. We don't quite know > the specifics yet. My hunch is that this replacement is going to be most > complex for Wikimedia wikis and expect most 3rd party wikis to have a > much easier time switching over. > > > I'm asking specifically because we are running some custom extensions > which > > will probably break with the advent of Parsoid/PHP. > > At present we are running MW 1.33 on PHP 7.0, but we are not using VE. > > It would be fine if we as a maintenance team have to invest some (or even > > considerable) time and effort but I would like to know the size of the > > endeavour beforehand... > > One of the changes that will take some work is how extensions interact > with the parser (Parsoid in the future). So far, this happens through > access to the Parser object as well as through parser hooks. However, in > the Parsoid regime, this model will change. While the details are yet to > be finalized and we are yet to publish the first draft for review > (likely in the next couple months), here is how we've been thinking > about this: > > 1. Extensions will no longer have direct access to the parser itself -- > all interaction will be through an API / interface. > > 2. Hooks are unlikely to be based on timelines of how wikitext passes > through the parser, i.e. before something happens, or after something > happens. We are going to move more towards a pure functional model as > far as possible. So, as far as extension tags are concerned, they get > access to the tag source, args, and possibly some other information and > are expected to return output HTML / a DOM fragment (here they will > leverage the parser API/interface I mention in 1. above). Most > extensions that implement custom tags already behave in this manner and > this simply formalizes that. > > 3. Some extensions set parser state and update it across invocations. We > currently have no intention of supporting that. We are going to look at > what the underlying need is that is being modeled through side-effects / > state and will to provide first-class support for that in some manner. > For example, some (like Cite) use state for enumeration and numbering > purposes, and this can be done as a post-processing pass on the DOM when > they get to inspect the "final" DOM. Presumably these global document > processors are the exception, not the norm. But, statelessness lets us > process the document in arbitrary order (or even skip processing parts > of the document by reusing extension/template/media output from previous > versions of the document), and use the final post-processing step as the > synchronization step to enforce source-text ordering (like numbering). > > We anticipate most extensions are going to need some (hopefully minor) > changes. If your extension doesn't deal with wikitext itself, the > changes are probably going to be relatively minor. But, if your > extension deals with wikitext, then it might need an update in terms of > how it generates its output (using the ParsoidExtensionAPI interface > instead of an actual parser object), but once again, this is unlikely to > be very significant changes. However, if your extension maintains state > across invocations, then it might need some rethink (as stated in 3. > above). > > As our extension currently does some hacks to get a correct behaviour when modifying the ToC, I guess that the new model will be a fair bit easier. As stated, it will no longer be necessary to work with wikitext and instead a hook at postprocessing level should be sufficient for that. This makes all the templating also much easier. In summary I'm looking forward to the exact implementation and while I expect considerable work, it will be no way near the complexity that has currently been built; just work migrating from A to B. > If you want to get a really early look, you can poke around the Parsoid > repo and its reimplementation of a few extensions [2]. But, note that we > still have some work to do to (a) clean up the interfaces, (b) untangle > them further from Parsoid's internals and (c) make sure our design is > consistent with Tim's proposed work around hooks in general [3] [4]. So, > what you see in the Parsoid repo today may not be what it will look like > in the end (in terms of exact interfaces - names, methods, signatures), > but they will nevertheless operate within the constraints / principles > 1-3 above. > > As a long-term goal, we are trying to nudge wikitext (including > templates, extensions) towards one where the final output is a > composition of largely independent fragments (no matter who/what > generated those fragments) with some mostly minor post-processing after > the document is composed. An updated extension and parser hooks API > during the switch to Parsoid is one of the first steps. Balanced / Typed > templates will be the next step in that direction. [5]. > > Hope this helps in planning early. Thanks for asking - it nudged me to > outline our thinking early even before we have the publishable first > draft of our updated extension model. > > Subbu ( on behalf of the Parsing Team ). > > [1] https://blog.wikimedia.org/2018/07/09/tidy-html5-replacement/ > > [2] https://github.com/wikimedia/parsoid/tree/master/src/Ext > > [3] > https://lists.wikimedia.org/pipermail/wikitech-l/2019-December/092867.html > > [4] https://phabricator.wikimedia.org/T240307 > > [5] https://phabricator.wikimedia.org/T114445 > > To read some more context (some of which I had been following before) and seeing it come together into a coherent plan of action is really cool. Best of luck with all the complexity still ahead, and props for what you all have achieved so far! Best, LF _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
