[Wikitech-l] Re: [Wikitext-l] Parsoid template transclusion behavior

Subramanya Sastry Sun, 18 Feb 2024 11:02:12 -0800

[ Resending since I forgot to copy all lists -- please don't mind theduplicate response on wikitext-l. ]

Our primary goal with Parsoid today is to ensure maximum compatibilitywith the current default parser -- without that, it would be impossibleto switch over to Parsoid for all page rendering use cases.

But, at the core, Parsoid's design has always pursued a processing modelwhere content (fragment) generators (whether templates, extensions,parser functions, or in the future wiki functions or other pagecomponents) are decoupled from the page where they are embedded. Thislets us process them independently and incorporate those generatedfragments efficiently. Parsoid uses this model for extensions already.But, that model hasn't held up for templates as they are implementedtoday because of how they are used and what they generate (snippets oftext that can be full or partial attributes, mix of attributes andcontent, parts of tables) -- table use cases being the most egregious ofthose.

So given these practical realities, the simplest course of action for usto handle templates today is to have them be fully expanded as textualstrings and do additional processing within Parsoid. But, Parsoid stillis able to clearly demarcate page content that comes from templates (andother content generators) even where the template content combines withpage level content in some complex ways (some caused by table contentmarkup errors causing content fostering -- a source of unnecessarycomplexity and headaches for us).

Our goal is to start moving towards the original decoupled processingmodel for templates as well, but only after we are able to switch overto Parsoid more fully and that is looking closer than ever at thispoint. But, that is going to be a gradual evolution -- there are variousproposals we have considered in the past here, but typing is probablythe overarching concept that ties all those ideas together.

Hope that answers your primary question. Some additional tangentialdetails below while I am at it.

<tangent>All that said, I wouldn't invest too much time analyzing thecontents of that page and the notions of single-pass or multi-pass orPEG vs not-PEG, etc. Those are somewhat immaterial implementationdetails. I am not sure I would describe Parsoid as a single-pass modeltoday. It is single-pass in only so far as it processes the textualstring in one pass. But, otherwise, the generated tokens are processedmultiple times as they are transformed. The DOM that is built up isprocessed multiple times ... so, if anything, Parsoid has a lot more(20+) passes. Separately, given that we cannot really process thewikitext stream to a fully processed semantic tree (because of thenature of wikitext), we could have used other ways of generating tokensalong with corresponding token transformers to get the same end result.Since it is mostly water under the bridge now, we haven't reallyinvestigated the route of how this might have looked if we had usedtraditional LALR techniques (as long as we realize the output of thatgrammar would just be a different set of tokens, not a conventionalAST). I am mostly mentioning this tangent to emphasize that our goalhere is not to arrive at a formal (implementation) grammar in thetraditional programming language sense, but rather to transition to adifferent (decoupled / typed) processing model while preservingcompatibility in the interim and while giving us feasible migrationpaths to that model.</tangent>


Subbu.

On 2/16/24 23:10, psnbaotg wrote:

Hello,
Recently I'm researching Parsoid's design as MW is migrating toParsoid. I found out that due to its single-pass tokenizing design,templates are not handled textually as the legacy parser does.
This is good as the HTML now have information about which templatethey are transcluded from. However,https://www.mediawiki.org/wiki/Parsoid/limitations says "We have sincedecided to use the PHP preprocessor for template expansions, whichside-steps these issues by reverting to the traditional textualpreprocessor pass". Is this still true now?
Best regards,
Diskdance

_______________________________________________
Wikitext-l mailing list --wikitex...@lists.wikimedia.org
To unsubscribe send an email towikitext-l-le...@lists.wikimedia.org

_______________________________________________
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: [Wikitext-l] Parsoid template transclusion behavior

Reply via email to