This looks great - thanks for putting together a run down of our new
direction.

Any ideas about what happens if a parser hook, parser function or template
resolves to just plain text, without any wrapping HTML? Where does the
microdata get stored? If we wrap it, how do we decide what to wrap it in?

- Trevor

On Thu, Feb 2, 2012 at 9:38 AM, Gabriel Wicke <[email protected]> wrote:

> We, the Visual Editor team, have decided to move away from the custom
> WikiDom format in favor of plain HTML DOM, which is already used
> internally in the parser. The mapping of WikiText to the DOM was very
> pragmatic so far, but now needs to be cleaned up before being used as an
> external interface. Here are a few ideas for this.
>
> Wikitext can be divided into shorthand notation for HTML elements and
> higher-level features like templates, media display or categories.
>
> The shorthand portion of wikitext maps quite directly to an HTML DOM.
> Details like the handling of unbalanced tags while building the DOM
> tree, remembering extra whitespace or wiki vs. html syntax for
> round-tripping need to be considered, but appear to be quite manageable.
> This should be especially true if some normalization in edge cases can
> be tolerated. We plan to localize normalization (and thus mostly avoid
> dirty diffs) by serializing only modified DOM sections while using the
> original source for unmodified DOM parts. Attributes are used to track
> the original source offsets of DOM elements.
>
> Higher-level features can be represented in the HTML DOM using different
> extension mechanisms:
>
> * Introduce custom elements with specific attributes:
>  <template href="Template:Bla' args=".../>
>  For display or WYSIWYG  editing these elements then need to be
>  expanded with the template contents, thumbnail html and so on.
>  Unbalanced templates (table start/row/end) are very difficult
>  to expand.
>
> * Expand higher-level features to their presentational DOM, but
>  identify and annotate the result using custom attributes. This is the
>  approach we have taken so far in the JS parser [1]. Template
>  arguments and similar information are stored as JSON in data
>  attributes, which made their conversion to the JSON-based WikiDom
>  format quite easy.
>
> Both are custom solutions for internal use. For an external interface, a
> standardized solution would be preferable. HTML5 microdata [2] seems to
> fit our needs quite well.
>
> Assuming a template that expands to a div and some content, this would
> be represented like this:
>
> <div itemscope
>    itemtype='http://en.wikipedia.org/wiki/Template:Sometemplate' >
>    <h2>A static header from the template</h2>
>    <!-- The template argument 'name', expanded in the template -->
>    <p itemprop='name' content='The wikitext name'>The rendered name</p>
> </div>
>
> In this case, an expanded template argument within (for example) an
> infobox is identified inside the template-provided HTML structure, which
> could enable in-place editing.
>
> Unused arguments (which are not found in the template expansion) or
> unexpanded templates can be represented using non-displaying meta elements:
>
> <div itemscope
>    itemtype='http://en.wikipedia.org/wiki/Template:Sometemplate'
>    id='uid-1' >
>    <h2>A static header from the template</h2>
>    <!-- The template argument 'name', expanded in the template -->
>    <p itemprop='name' content='The wikitext name'>The rendered name</p>
>    <meta itemprop='firstname' content='The wikitext firstname'>
> </div>
>
> The itemref mechanism can be used to tie together template data from a
> single template that does not expand to a single subtree:
>
> <div itemscope itemref='uid-1'>
>  <!-- Some more template output from expansion of
> http://en.wikipedia.org/wiki/Template:Sometemplate -->
> </div>
>
> The itemtype attributes in these examples all point to the template
> location, which normally contains a plain-text documentation of the
> template parameters and their semantics. The most common application of
> microdata however references standardized schemas, often from
> http://schema.org as those are understood by Google [3], Microsoft, and
> Yahoo!. A mapping of semi-structured template arguments to a standard
> schema is possible as demonstrated by http://dbpedia.org/. It appears to
> be feasible to provide a similar mapping directly as microdata within
> the template documentation, which could then potentially be used to add
> standard schema information to regular HTML output when rendering a page.
>
> The visual editor could also use schema information to customize the
> editing experience for templates or images. Inline editing of fields in
> infoboxes with schema-based help is one possibility, but in other cases
> a popup widget might be more appropriate. Additional microdata in
> template documentation sections could provide layout or other UI
> information for these widgets.
>
> There are still quite a few loose ends, but I think the general
> direction of reusing standards as far as possible and hooking into the
> thriving HTML5 ecosystem has many advantages. It allows us to reuse
> quite a few libraries and infrastructure, and makes our own developments
> (and data of course) more useful to others.
>
> So- I hope you made it here without falling asleep!
>
> What do you think about these ideas?
>
> Gabriel
>
> References:
> [1]: http://www.mediawiki.org/wiki/Future/Parser_development
> [2]:
> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html
> [3]:
>
> http://support.google.com/webmasters/bin/answer.py?hl=en&answer=99170&topic=21997&ctx=topic
>
> This text is on the wiki at
> http://www.mediawiki.org/wiki/Future/HTML5_DOM_with_microdata
>
>
> _______________________________________________
> Wikitext-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>
_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to