Re: [Wikitext-l] WikiDom serializers

Gabriel Wicke Mon, 07 Nov 2011 10:59:47 -0800

Hello Trevor and list,

since last week's integration in the VisualEditor extension, things are 
progressing well. Lists including definition lists and tables are parsed to 
WikiDom and rendered in the HTML serializer. The parser is still quite rough 
and in flux at this stage, but the general structures mostly work when 
running the parser tests using node.js. The Wikitext serializer is not yet 
wired up as I currently concentrate on the parser and its WikiDom output, 
but will be added for round-trip testing.


Apart from general grammar tweaking I am now working on a conversion of 
inline elements into WikiDom annotations. The main challenge is the 
calculation of plain-text offsets. I am trying to avoid building an 
intermediate structure, but might fall back to it if things get too messy 
when interleaving this calculation with parsing.

>>    - es.AnnotationSerializer needs some nesting smartness, so
>>    that overlapped regions open and close properly (<b>a<i>b</b>c</i>
>>    should be <b>a<i>b</i></b><i>c</i> - es.ContentView does this
>>    correctly but is working from the linear data model)

Parsing these overlapped annotations is not supported too well right now, 
but should be doable using a multi-pass or shallow (token-only) parsing 
strategy for inline content. Pushing nesting and content model fix-ups to 
the serializer should also make it easier to approximate the parsing rules 
in the HTML5 specification [1] without forcing too much normalization. 
Mostly, the HTML5 parsing spec is a bit more systematic version of what tidy 
does right now after the MediaWiki parser has tried its best. 

Some early fix-ups seem to be needed to allow proper editing in particular 
of block-level elements, so I am currently a bit sceptical about avoiding 
normalization completely.

>>    - We need some sort of context that can be asked for the HTML of a
>>    template, whether a page exists, etc. Initially this work is all done
>>    on the client, which means this is a wrapper for a lot of API calls,
>>    but either way, having a firm API between the renderer and the site
>>    context will help keep things clean and flexible

Brion already implemented a simple context object for transclusion tests. 
This is probably not yet the final API, but already a good start.

Gabriel

[1]: HTML5 parsing spec: http://dev.w3.org/html5/spec/Overview.html#parsing


_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Re: [Wikitext-l] WikiDom serializers

Reply via email to