Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

Daniel Kinzler Wed, 14 May 2014 06:23:43 -0700

Am 14.05.2014 15:11, schrieb Gabriel Wicke:
> On 05/14/2014 01:40 PM, Daniel Kinzler wrote:
>>> This means that HTML returned from the preprocessor needs to be valid in
>>> wikitext to avoid being stripped out by the sanitizer. Maybe that's actually
>>> possible, but my impression is that you are shooting for something that's
>>> closer to the behavior of a tag extension. Those already bypass the
>>> sanitizer, so would be less troublesome in the short term.
>>
>> Yes. Just treat <html>...</html> like a tag extension, and it should work 
>> fine.
>> Do you see any problems with that?
> 
> First of all you'll have to make sure that users cannot inject <html> tags
> as that would enable arbitrary XSS. I might have missed it, but I believe
> that this is not yet done in your current patch.


My patch doesn't change the handling of <html>...</html> by the parser. As
before, the parser will pass HTML code in <html>...</html> through only if
wgRawHtml is enabled, and will mangle/sanitize it otherwise.

My patch does mean however that the text return by expandtemplates may not
render as expected when processed by the parser. Perhaps anomie's approach of
preserving the original template call would work, something like:

  <html template="{{T}}">...</html>

Then, the parser could apply the normal expansion when encountering the tag,
ignoring the pre-rendered HTML.

> In contrast to normal tag extensions <html> would also contain fully
> rendered HTML, and should not be piped through action=parse as is done in
> Parsoid for tag extensions (in absence of a direct tag extension expansion
> API end point). We and other users of the expandtemplates API will have to
> add special-case handling for this pseudo tag extension.

Handling for the <html> tag should already be in place, since it's part of the
core spec. The issue is only to know when to allow/trust such <html> tags, and
when to treat them as plain text (or like a <pre> tag).

> In HTML, the <html> tag is also not meant to be used inside the body of a
> page. I'd suggest using a different tag name to avoid issues with HTML
> parsers and potential name conflicts with existing tag extensions.

As above: <html> is part of the core syntax, to support $wgRawHtml. It's just
disabled per default.

> Overall it does not feel like a very clean way to do this. My preference
> would be to let the consumer directly ask for pre-expanded wikitext *or*
> HTML, without overloading action=expandtemplates. 

The question is how to represent non-wikitext transclusions in the output of
expandtemplates. We'll need an answer to this question in any case.

For the main purpose of my patch, expandtemplates is irrelevant. I added the
special mode that generates <html> specifically to have a consistent wikitext
representation for use by expandtemplates. I could simply disable it just as
well, so no expansion would apply for such templates when calling
expandtemplates (as is done for special page inclusiono).

> Even indicating the
> content type explicitly in the API response (rather than inline with an HTML
> tag) would be a better stop-gap as it would avoid some of the security and
> compatibility issues described above.

The content type did not change. It's wikitext.

-- daniel

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Transcluding non-text content as HTML on wikitext pages

Reply via email to